Movie Question Answering: Remembering the Textual Cues for Layered Visual Contents
Authors: Bo Wang, Youjiang Xu, Yahong Han, Richang Hong
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments on the Movie QA dataset. With only visual content as inputs, LMN with framelevel representation obtains a large performance improvement. When incorporating subtitles into LMN to form the clip-level representation, we achieve the state-of-the-art performance on the online evaluation task of Video+Subtitles . The good performance successfully demonstrates that the proposed framework of LMN is effective and the hierarchically formed movie representations have good potential for the applications of movie question answering. |
| Researcher Affiliation | Academia | Bo Wang, Youjiang Xu, Yahong Han School of Computer Science and Technology, Tianjin University, Tianjin, China {bwong, yjxu, yahong}@tju.edu.cn Richang Hong School of Computer and Information, Hefei University of Technology, Hefei, China hongrc.hfut@gmail.com |
| Pseudocode | No | The paper describes the method using text and diagrams but does not include any formal pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not state that the source code for their methodology is publicly available, nor does it provide a link to a repository. |
| Open Datasets | Yes | We evaluate the Layered Memory Network on the Movie QA dataset (Tapaswi et al. 2016), which contains multiple sources of information such as video clips, plots, subtitles, and scripts. |
| Dataset Splits | Yes | The 6,462 questionsanswer pairs are split into 4,318, 886, and 1,258 for training, validation, and test set, respectively. Also, the 140 movies (totally 6,771 clips) are split into 4385, 1098 and 1288 clips for training, validation, and test set, respectively. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using a 'word2vec model' and 'skip-gram model' but does not specify versions for broader software dependencies like programming languages, deep learning frameworks, or other libraries. |
| Experiment Setup | Yes | For training our LMN model, all the model parameters are optimized by minimizing the cross-entropy loss using stochastic gradient descent. The batch size is set to 8 and the learning rate is set to 0.01. We perform early stopping on the dev set (10% of the training set). |