UCF-STAR: A Large Scale Still Image Dataset for Understanding Human Actions

Authors: Marjaneh Safaei, Pooyan Balouchian, Hassan Foroosh2677-2684

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To benchmark and demonstrate the benefits of UCF-STAR as a large-scale dataset, and to show the role of latent motion information in recognizing human actions in still images, we present a novel approach relying on predicting temporal information, yielding higher accuracy on 5 widely-used datasets.
Researcher Affiliation Academia Marjaneh Safaei, Pooyan Balouchian, Hassan Foroosh Department of Computer Science University of Central Florida (UCF) Orlando, FL 32816-2362 {marjaneh.safaei, pooyan}@knights.ucf.edu, Hassan.Foroosh@ucf.edu
Pseudocode No The paper describes its methods textually and mathematically but does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an unambiguous statement about releasing source code for the described methodology, nor does it include a direct link to a code repository.
Open Datasets Yes We fully compare UCF-STAR with existing image datasets in terms of their challenges. Stanford-40 (Yao et al. 2011) contains 40 classes and 9,532 images. Willow (Delaitre, Laptev, and Sivic 2010)... WIDER (Xiong et al. 2015)... BU-101 (Ma et al. 2017).
Dataset Splits Yes We further split the 1,038,622 images into mutually exclusive 664,718 training, 166,180 validation, and 207,724 test images.
Hardware Specification No The paper does not provide specific hardware details (such as exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper mentions tools and architectures like 'Bing s Cognitive Services API' and 'Stacked Hourglass Networks' but does not provide specific version numbers for any software dependencies or libraries needed to replicate the experiment.
Experiment Setup Yes Each stream is formed by sixteen successive convolutional layers followed by three fully connected layers. We denote the convolutional layers as CON(k,s), indicating that there are k kernels, of size s s. The input to our CNN is a fixed-size 224 224 image. The convolution stride is fixed to 1 pixel. Max-pooling is performed over a 2 2 pixel window, with stride 2. Finally, FC(n) denotes a fully connected layer with n neurons. We change the last FC layer, used smaller learning rates for layers that are being fine-tuned...