reproducibilityindex.ai

Multimodal Keyless Attention Fusion for Video Classification

Authors: Xiang Long, Chuang Gan, Gerard Melo, Xiao Liu, Yandong Li, Fu Li, Shilei Wen

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experiment on four highly heterogeneous datasets, UCF101, Activity Net, Kinetics, and You Tube-8M to validate our conclusion, and show that our approach achieves highly competitive results.
Researcher Affiliation	Collaboration	Xiang Long,1 Chuang Gan,1 Gerard de Melo,2 Xiao Liu,3 Yandong Li,3 Fu Li,3 Shilei Wen,3 1Tsinghua University , 2Rutgers University , 3Baidu IDL
Pseudocode	No	The paper describes mathematical equations for its models (e.g., LSTM equations 4-9), but it does not contain a structured pseudocode block or a clearly labeled algorithm figure.
Open Source Code	No	The paper does not provide an explicit statement about releasing the source code for the described methodology or a link to a code repository.
Open Datasets	Yes	We evaluate our approach on four popular video classiﬁcation datasets. UCF101 (Soomro, Roshan Zamir, and Shah 2012) [...] Activity Net (Heilbron et al. 2015) [...] Kinetics (Carreira and Zisserman 2017) [...] You Tube-8M (Abu-El-Haija et al. 2016)
Dataset Splits	Yes	UCF101: Following the original evaluation scheme, we report the average accuracy over three training/testing splits. Activity Net: In the ofﬁcial split, the distribution among training, validation, and test data is about 50%, 25%, and 25% of the total videos, respectively. Kinetics: The dataset contains 246,535 training videos, 19,907 validation videos, and 38,685 test videos, covering 400 human action classes. You Tube-8M: In the ofﬁcial split, the distribution among training, validation, and test data is about 70%, 20%, and 10%, respectively.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions models and algorithms (e.g., 'Res Net-152', 'RMSPROP algorithm') but does not specify software dependencies with version numbers (e.g., Python, TensorFlow/PyTorch versions).
Experiment Setup	Yes	For UCF101 and Activity Net, we extract both RGB and ﬂow features using a Res Net-152 (He et al. 2016) model. For Kinetics, we extract RGB and ﬂow features using Inception Res Net-v2 (Szegedy et al. 2016) and extract audio features with a VGG-16 (Simonyan and Zisserman 2014a). The number of segments we used for ﬁne-tuning is 3 for UCF101, and 7 for Activity Net and Kinetics... We max-pool the frame-level features to 5 segment-level features for UCF101 and Kinetics... and 20 for Activity Net... For You Tube-8M... the maximum number of segment is 300. The number of hidden units for the LSTM on UCF101, Activity Net, and Kinetics is 512, while for You Tube-8M, we use 1024... with a learning rate of 0.0001.