Knowledge-Constrained Answer Generation for Open-Ended Video Question Answering
Authors: Yao Jin, Guocheng Niu, Xinyan Xiao, Jian Zhang, Xi Peng, Jun Yu
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our model on two open-ended benchmark datasets to demonstrate that we can effectively and robustly generate high-quality answers without restrictions of training data. [...] On two open-ended benchmark datasets (i.e. NEx T-QA (Xiao et al. 2021), TGIF-QA (Jang et al. 2017)), we conduct extensive experiments and obtain the state-of-the-art results. |
| Researcher Affiliation | Collaboration | 1Hangzhou Dianzi University 2Baidu Inc. 3Zhejiang International Studies University 4College of Computer Science, Sichuan Univerisity |
| Pseudocode | No | The paper provides mathematical equations describing the model's components and their interactions (e.g., equations 1-13) and describes the system architecture. However, it does not include a distinct block labeled "Pseudocode" or "Algorithm" with structured steps. |
| Open Source Code | No | The paper does not contain any explicit statement about making the source code for the described methodology publicly available, nor does it provide a link to a code repository. |
| Open Datasets | Yes | Open-QA Dataset is an originally constructed dataset from TGIF-QA (Jang et al. 2017) for open-ended Video QA. [...] NEx T-QA Dataset is a dataset that focuses on video exploitability (Xiao et al. 2021). |
| Dataset Splits | Yes | We take the text of correct answer to the muti-label classification as the answer to the question, and choose 35862, 7317 and 8506 questions with valid answers (whose frequency of occurrence is more than 10 times) from TGIF-QA to build the train, validation and test sets of Open-QA Dataset. [...] The dataset contains 3,870 train, 570 validation, and 1,000 test videos with 37523, 5343 and 9178 openended questions respectively. |
| Hardware Specification | No | The paper describes the software components and parameters used for training (e.g., optimizer, learning rate, batch size, epochs) but does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions several software components and models used, such as "Clip BERT", "GPT2", "Aadm", "Word Piece", "Faster R-CNN with Res Net-101", and "CLIP". However, it does not provide specific version numbers for any of these dependencies. |
| Experiment Setup | Yes | During the training phase, we set an initial learning rate of 5e-5 to warm up in the first 10% of training steps, then let it decay linearly to 0. The batch size is set to 256 and the dropout rate is set to 0.3. For each task, we train the model for 50 epochs. |