ActivityNet-QA: A Dataset for Understanding Complex Web Videos via Question Answering

Authors: Zhou Yu, Dejing Xu, Jun Yu, Ting Yu, Zhou Zhao, Yueting Zhuang, Dacheng Tao9127-9134

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present a statistical analysis of our Activity Net-QA dataset and conduct extensive experiments on it by comparing existing Video QA baselines.
Researcher Affiliation Academia 1Key Laboratory of Complex Systems Modeling and Simulation, School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, China. 2College of Computer Science, Zhejiang University, Hangzhou, China 3UBTECH Sydney AI Centre, SIT, FEIT, University of Sydney, Australia
Pseudocode No The paper describes procedures and models but does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described in this paper.
Open Datasets Yes Our dataset exploits 5,800 videos from the Activity Net dataset, which contains about 20,000 untrimmed web videos representing 200 action classes (Fabian Caba Heilbron and Niebles 2015).
Dataset Splits Yes We use 3,200 videos and 32,000 corresponding QA pairs in the train split to train the models, and 1,800 videos and 18,000 corresponding QA pairs in the val split to tune hyper-parameters. We report the predicted results on 800 videos and 8,000 QA pairs in the test split.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments, only mentioning general implementation in TensorFlow.
Software Dependencies No The paper mentions using 'Tensor Flow' and 'Glo Ve embedding' but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes For all models, we use the Adam solver with a base learning rate α = 0.001, β1 = 0.9, and β2 = 0.99 and train the models to up to 100 epochs with a batch size of 100. The early stopping strategy is used if the accuracy on the validation set does not improve for 10 epochs. All models use the pre-trained 300-dimensional Glo Ve embedding (Pennington, Socher, and Manning 2014) to initialize the question embedding layer. For the models using LSTM networks, the number of LSTM hidden units is set to 300, and the common space dimension is set to 256 as suggested by (Xu et al. 2017). The number of memory units for E-MN is set to 500 as suggested by (Zeng et al. 2017).