Leveraging Video Descriptions to Learn Video Question Answering
Authors: Kuo-Hao Zeng, Tseng-Hung Chen, Ching-Yao Chuang, Yuan-Hong Liao, Juan Carlos Niebles, Min Sun
AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we evaluate performance on manually generated video-based QA pairs. The results show that our self-paced learning procedure is effective, and the extended SS model outperforms various baselines. |
| Researcher Affiliation | Academia | Department of Electrical Engineering, National Tsing Hua University Department of Computer Science, Stanford University |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found. |
| Open Source Code | Yes | Available at http://aliensunmin.github.io/project/videolanguage/ |
| Open Datasets | Yes | We start by crawling an online curated video repository (http://jukinmedia.com/videos) to collect videos with high-quality descriptions. [...] 1Available at http://aliensunmin.github.io/project/videolanguage/ |
| Dataset Splits | Yes | We use 14100 videos and 151263 candidate QA pairs for training, 2000 videos and 21352 candidate QA pairs for validation, and 2000 videos and 2461 ground truth QA pairs for testing. |
| Hardware Specification | No | Only a general reference to 'GPU memory limit' was found. No specific GPU models, CPU models, or other detailed hardware specifications for running experiments were provided. |
| Software Dependencies | No | The paper mentions 'Tensor Flow (et al. 2015)' but does not provide specific version numbers for TensorFlow or any other software libraries or tools. |
| Experiment Setup | Yes | We implement and train all the extended methods using Tensor Flow (et al. 2015) with the batch size of 100 and selected the final model according to the best validation accuracy. Other model-specific training details are described below. E-MN. We use stochastic gradient descent with an initial learning rate of 0.001 [...] Inspired by several memory based models, we set 500 as the number of memories and the LSTM hidden dimension. [...] E-SA. We use the training settings as in (Yao et al. 2015), except for Adam optimization (Kingma and Ba 2015) with initial learning rate of 0.0001. E-SS. [...] We use Adam optimizer (Kingma and Ba 2015) with an initial learning rate of 0.0001. [...] at the first iteration of self-paced learning, we set γ to remove 10% QA pairs with small loss ratio in the training data. |