reproducibilityindex.ai

Dual Visual Attention Network for Visual Dialog

Authors: Dan Guo, Hui Wang, Meng Wang

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on the Vis Dial v0.9 and v1.0 datasets validate the effectiveness of the proposed approach.
Researcher Affiliation	Academia	Dan Guo , Hui Wang and Meng Wang School of Computer Science and Information Engineering, Hefei University of Technology guodan@hfut.edu.cn, wanghui.hfut@gmail.com, eric.mengwang@gmail.com
Pseudocode	No	The paper does not contain structured pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statement or link indicating that the source code for the described methodology is open-source or publicly available.
Open Datasets	Yes	We evaluate the proposed model on the Vis Dial v0.9 and v1.0 [Das et al., 2017] datasets.
Dataset Splits	Yes	Vis Dial v0.9 contains 83k dialogs on COCO-train images and 40k dialogs on COCO-val images (totally 1.2M QA pairs). ... Vis Dial v1.0 is an updated version of the Vis Dial v0.9, in which Vis Dial v0.9 is set to be the train split. And the new val and test splits of Vis Dial v1.0 contains 2k and 8k dialogs collected on COCO-like Flickr images, respectively.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	The paper mentions software components like VGG19, Faster R-CNN, NLTK, GloVe embedding, LSTM, Adam optimizer, and Dropout, but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	The captions, questions, and answers are truncated to 24/16/8 words for generative models, and 40/20/20 words for discriminative models, respectively. Next, each word is embedded into a 300-dim vector initialized by the Glo Ve embedding [Pennington et al., 2014]. All the LSTMs in our model are 1-layered with 512 hidden states. The Adam optimizer [Kingma and Ba, 2014] is adopted with initialized learning rate 4 10 4, multiplied by 0.5 after each 20 epochs. We also apply Dropout [Srivastava et al., 2014] with radio 0.5 for LSTM, attention modules, and the output of encoder. Finally, generative models are trained with a MLE loss (maximum likelihood estimation), while discriminative models are trained with a multi-class N-pair loss [Lu et al., 2017a].