UCF-STAR: A Large Scale Still Image Dataset for Understanding Human Actions
Authors: Marjaneh Safaei, Pooyan Balouchian, Hassan Foroosh2677-2684
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To benchmark and demonstrate the benefits of UCF-STAR as a large-scale dataset, and to show the role of latent motion information in recognizing human actions in still images, we present a novel approach relying on predicting temporal information, yielding higher accuracy on 5 widely-used datasets. |
| Researcher Affiliation | Academia | Marjaneh Safaei, Pooyan Balouchian, Hassan Foroosh Department of Computer Science University of Central Florida (UCF) Orlando, FL 32816-2362 {marjaneh.safaei, pooyan}@knights.ucf.edu, Hassan.Foroosh@ucf.edu |
| Pseudocode | No | The paper describes its methods textually and mathematically but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an unambiguous statement about releasing source code for the described methodology, nor does it include a direct link to a code repository. |
| Open Datasets | Yes | We fully compare UCF-STAR with existing image datasets in terms of their challenges. Stanford-40 (Yao et al. 2011) contains 40 classes and 9,532 images. Willow (Delaitre, Laptev, and Sivic 2010)... WIDER (Xiong et al. 2015)... BU-101 (Ma et al. 2017). |
| Dataset Splits | Yes | We further split the 1,038,622 images into mutually exclusive 664,718 training, 166,180 validation, and 207,724 test images. |
| Hardware Specification | No | The paper does not provide specific hardware details (such as exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions tools and architectures like 'Bing s Cognitive Services API' and 'Stacked Hourglass Networks' but does not provide specific version numbers for any software dependencies or libraries needed to replicate the experiment. |
| Experiment Setup | Yes | Each stream is formed by sixteen successive convolutional layers followed by three fully connected layers. We denote the convolutional layers as CON(k,s), indicating that there are k kernels, of size s s. The input to our CNN is a fixed-size 224 224 image. The convolution stride is fixed to 1 pixel. Max-pooling is performed over a 2 2 pixel window, with stride 2. Finally, FC(n) denotes a fully connected layer with n neurons. We change the last FC layer, used smaller learning rates for layers that are being fine-tuned... |