reproducibilityindex.ai

Modality-Balanced Models for Visual Dialogue

Authors: Hyounghun Kim, Hao Tan, Mohit Bansal8091-8098

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, our models achieve strong results on the Visual Dialog challenge 2019 (rank 3 on NDCG and high balance across metrics), and substantially outperform the winner of the Visual Dialog challenge 2018 on most metrics. We ﬁrst conduct a manual investigation on the Visual Dialog dataset (Vis Dial) to ﬁgure out how many questions can be answered only with images and how many of them need conversation history to be answered.
Researcher Affiliation	Academia	Hyounghun Kim, Hao Tan, Mohit Bansal Department of Computer Science University of North Carolina at Chapel Hill {hyounghk, airsplay, mbansal}@cs.unc.edu
Pseudocode	No	No pseudocode or algorithm blocks are provided. The model architecture and calculations are described using mathematical formulas and text.
Open Source Code	No	No explicit statement about releasing source code or a link to a code repository for the methodology described in this paper is provided.
Open Datasets	Yes	We use the Vis Dial v1.0 (Das et al. 2017) dataset to train our models
Dataset Splits	Yes	The whole dataset is split into 123,287/2,000/8,000 images for train/validation/test, respectively.
Hardware Specification	No	No specific hardware details such as GPU models, CPU types, or memory specifications used for running experiments are provided.
Software Dependencies	No	The paper mentions using Adam as an optimizer, LSTM-RNN, Faster R-CNN, and MFB, but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific library versions).
Experiment Setup	Yes	In our models, the size of word vectors is 300, the dimension of visual feature is 2048, and hidden size of LSTM units which are used for encoders of questions, context history, and candidate answers is 512. We set the initial learning rate to 0.001 and decrease it by 0.0001 per epoch until 8th epoch and decay by 0.5 from 9th epoch on. For round dropout, we set the maximum number of history features to be dropped to 3 and we tune the p value to 0.25 for our instance dropout in the consensus dropout fusion module. Cross-entropy is used to calculate the loss.