Movie Question Answering: Remembering the Textual Cues for Layered Visual Contents

Authors: Bo Wang, Youjiang Xu, Yahong Han, Richang Hong

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments on the Movie QA dataset. With only visual content as inputs, LMN with framelevel representation obtains a large performance improvement. When incorporating subtitles into LMN to form the clip-level representation, we achieve the state-of-the-art performance on the online evaluation task of Video+Subtitles . The good performance successfully demonstrates that the proposed framework of LMN is effective and the hierarchically formed movie representations have good potential for the applications of movie question answering.
Researcher Affiliation Academia Bo Wang, Youjiang Xu, Yahong Han School of Computer Science and Technology, Tianjin University, Tianjin, China {bwong, yjxu, yahong}@tju.edu.cn Richang Hong School of Computer and Information, Hefei University of Technology, Hefei, China hongrc.hfut@gmail.com
Pseudocode No The paper describes the method using text and diagrams but does not include any formal pseudocode or algorithm blocks.
Open Source Code No The paper does not state that the source code for their methodology is publicly available, nor does it provide a link to a repository.
Open Datasets Yes We evaluate the Layered Memory Network on the Movie QA dataset (Tapaswi et al. 2016), which contains multiple sources of information such as video clips, plots, subtitles, and scripts.
Dataset Splits Yes The 6,462 questionsanswer pairs are split into 4,318, 886, and 1,258 for training, validation, and test set, respectively. Also, the 140 movies (totally 6,771 clips) are split into 4385, 1098 and 1288 clips for training, validation, and test set, respectively.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies No The paper mentions using a 'word2vec model' and 'skip-gram model' but does not specify versions for broader software dependencies like programming languages, deep learning frameworks, or other libraries.
Experiment Setup Yes For training our LMN model, all the model parameters are optimized by minimizing the cross-entropy loss using stochastic gradient descent. The batch size is set to 8 and the learning rate is set to 0.01. We perform early stopping on the dev set (10% of the training set).