Location-Aware Graph Convolutional Networks for Video Question Answering
Authors: Deng Huang, Peihao Chen, Runhao Zeng, Qing Du, Mingkui Tan, Chuang Gan11021-11028
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the effectiveness of the proposed methods. Specifically, our method significantly outperforms state-of-the-art methods on TGIF-QA, Youtube2Text-QA and MSVD-QA datasets. |
| Researcher Affiliation | Collaboration | 1South China University of Technology, 2Peng Cheng Laboratory, Shenzhen, 3MIT-IBM Watson AI Lab |
| Pseudocode | Yes | Algorithm 1 Overall training process. Input: Video frame features; object set R; question Q 1: Construct the location-aware graph G as in Section 3.4 2: while not converges do 3: Extract question features FQ via Eq. (1) 4: Encode object location via Eq. (2), (3) and (4) 5: Compute the node features via Eq. (5) 6: Update adjacent matrix via Eq. (8) 7: Perform reasoning on graph via Eq. (6) 8: Obtain visual features FV via Eq. (10) 9: Obtain FC from FV and FQ via Eq. (12) 10: Predict answers from FC with answer predictor 11: end while Output: Trained model for video QA |
| Open Source Code | No | The paper does not include an unambiguous statement that the authors are releasing their source code, nor does it provide a direct link to a code repository for the methodology described. |
| Open Datasets | Yes | TGIF-QA (Jang et al. 2017) consists of 165K QA pairs from 72K animated GIFs... Youtube2Text-QA (Ye et al. 2017) includes the videos from MSVD video set (Chen and Dolan 2011) and the question-answer pairs collected from Youtube2Text (Guadarrama et al. 2013) video description corpus. MSVD-QA (Xu et al. 2017) is based on MSVD video set. |
| Dataset Splits | No | The paper does not explicitly provide specific train/validation/test dataset splits, percentages, or sample counts, nor does it reference predefined splits with citations for reproducibility. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, processor types, or memory amounts used for running the experiments. |
| Software Dependencies | No | The paper mentions several software components like GloVe, Mask R-CNN, and Adam optimizer, but it does not provide specific version numbers for any of these or other key software dependencies. |
| Experiment Setup | Yes | By default, K is set to 5. The number of GCNs layers is set to 2. We employ a Adam optimizer (Kingma and Ba 2015) to train the network with an initial learning rate of 1e-4. We set the batch size to 64 and 128 for multiple-choice and open-ended tasks, respectively. |