Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Location-Aware Graph Convolutional Networks for Video Question Answering
Authors: Deng Huang, Peihao Chen, Runhao Zeng, Qing Du, Mingkui Tan, Chuang Gan11021-11028
AAAI 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the effectiveness of the proposed methods. Specifically, our method significantly outperforms state-of-the-art methods on TGIF-QA, Youtube2Text-QA and MSVD-QA datasets. |
| Researcher Affiliation | Collaboration | 1South China University of Technology, 2Peng Cheng Laboratory, Shenzhen, 3MIT-IBM Watson AI Lab |
| Pseudocode | Yes | Algorithm 1 Overall training process. Input: Video frame features; object set R; question Q 1: Construct the location-aware graph G as in Section 3.4 2: while not converges do 3: Extract question features FQ via Eq. (1) 4: Encode object location via Eq. (2), (3) and (4) 5: Compute the node features via Eq. (5) 6: Update adjacent matrix via Eq. (8) 7: Perform reasoning on graph via Eq. (6) 8: Obtain visual features FV via Eq. (10) 9: Obtain FC from FV and FQ via Eq. (12) 10: Predict answers from FC with answer predictor 11: end while Output: Trained model for video QA |
| Open Source Code | No | The paper does not include an unambiguous statement that the authors are releasing their source code, nor does it provide a direct link to a code repository for the methodology described. |
| Open Datasets | Yes | TGIF-QA (Jang et al. 2017) consists of 165K QA pairs from 72K animated GIFs... Youtube2Text-QA (Ye et al. 2017) includes the videos from MSVD video set (Chen and Dolan 2011) and the question-answer pairs collected from Youtube2Text (Guadarrama et al. 2013) video description corpus. MSVD-QA (Xu et al. 2017) is based on MSVD video set. |
| Dataset Splits | No | The paper does not explicitly provide specific train/validation/test dataset splits, percentages, or sample counts, nor does it reference predefined splits with citations for reproducibility. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, processor types, or memory amounts used for running the experiments. |
| Software Dependencies | No | The paper mentions several software components like GloVe, Mask R-CNN, and Adam optimizer, but it does not provide specific version numbers for any of these or other key software dependencies. |
| Experiment Setup | Yes | By default, K is set to 5. The number of GCNs layers is set to 2. We employ a Adam optimizer (Kingma and Ba 2015) to train the network with an initial learning rate of 1e-4. We set the batch size to 64 and 128 for multiple-choice and open-ended tasks, respectively. |