reproducibilityindex.ai

UCF-STAR: A Large Scale Still Image Dataset for Understanding Human Actions

Authors: Marjaneh Safaei, Pooyan Balouchian, Hassan Foroosh2677-2684

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To benchmark and demonstrate the beneﬁts of UCF-STAR as a large-scale dataset, and to show the role of latent motion information in recognizing human actions in still images, we present a novel approach relying on predicting temporal information, yielding higher accuracy on 5 widely-used datasets.
Researcher Affiliation	Academia	Marjaneh Safaei, Pooyan Balouchian, Hassan Foroosh Department of Computer Science University of Central Florida (UCF) Orlando, FL 32816-2362 {marjaneh.safaei, pooyan}@knights.ucf.edu, Hassan.Foroosh@ucf.edu
Pseudocode	No	The paper describes its methods textually and mathematically but does not include any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide an unambiguous statement about releasing source code for the described methodology, nor does it include a direct link to a code repository.
Open Datasets	Yes	We fully compare UCF-STAR with existing image datasets in terms of their challenges. Stanford-40 (Yao et al. 2011) contains 40 classes and 9,532 images. Willow (Delaitre, Laptev, and Sivic 2010)... WIDER (Xiong et al. 2015)... BU-101 (Ma et al. 2017).
Dataset Splits	Yes	We further split the 1,038,622 images into mutually exclusive 664,718 training, 166,180 validation, and 207,724 test images.
Hardware Specification	No	The paper does not provide specific hardware details (such as exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions tools and architectures like 'Bing s Cognitive Services API' and 'Stacked Hourglass Networks' but does not provide specific version numbers for any software dependencies or libraries needed to replicate the experiment.
Experiment Setup	Yes	Each stream is formed by sixteen successive convolutional layers followed by three fully connected layers. We denote the convolutional layers as CON(k,s), indicating that there are k kernels, of size s s. The input to our CNN is a ﬁxed-size 224 224 image. The convolution stride is ﬁxed to 1 pixel. Max-pooling is performed over a 2 2 pixel window, with stride 2. Finally, FC(n) denotes a fully connected layer with n neurons. We change the last FC layer, used smaller learning rates for layers that are being ﬁne-tuned...