DMRM: A Dual-Channel Multi-Hop Reasoning Model for Visual Dialog

Authors: Feilong Chen, Fandong Meng, Jiaming Xu, Peng Li, Bo Xu, Jie Zhou7504-7511

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on the Vis Dial v0.9 and v1.0 datasets demonstrate that the proposed model is effective and outperforms compared models by a significant margin. ... We validate the DMRM model on large-scale datasets: Vis Dial v0.9 and v1.0 (Das et al. 2017). DMRM achieves the state-of-the-art results on some metrics compared to other methods. We also conduct ablation studies to demonstrate the effectiveness of our proposed components. Furthermore, we conduct the human evaluation to indicate the effectiveness of our model in inferring answers.
Researcher Affiliation Collaboration Feilong Chen,1,2,3,4 Fandong Meng,2 Jiaming Xu,1,3 Peng Li,2 Bo Xu,1,3,4,5 Jie Zhou2 1Institute of Automation, Chinese Academy of Sciences (CASIA), Beijing, China. 2Pattern Recognition Center, We Chat AI, Tencent Inc., China 3Research Center for Brain-inspired Intelligence, CASIA 4University of Chinese Academy of Sciences 5Center for Excellence in Brain Science and Intelligence Technology, CAS. China
Pseudocode No The paper describes mathematical equations and processes, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes 1Code is available at https://github.com/phellonchen/DMRM.
Open Datasets Yes Datasets We evaluate our proposed approach on the Vis Dial v0.9 and v1.0 datasets (Das et al. 2017).
Dataset Splits Yes Vis Dial v0.9 contains 83k dialog on COCO-train (Lu et al. 2017a) and 40k dialog on COCO-val (Lu et al. 2017a) images, for a total of 1.23M dialog question-answer pairs. Vis Dial v1.0 dateset is an extension of Vis Dial v0.9 dateset with an additional 10k COCOlike images from Flickr. Overall, Vis Dial v1.0 dateset contains 123k (all images from v0.9), 2k and 8k images as train, validation and test splits, respectively.
Hardware Specification No The paper describes the models, data, and training process but does not mention any specific hardware details such as GPU models, CPU types, or memory specifications used for the experiments.
Software Dependencies No The paper mentions tools like 'Faster R-CNN' and 'Glove embeddings', and optimizers like 'Adam optimizer', but does not specify any version numbers for these software components or other libraries (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes The captions, questions and answers are further truncated to ensure that they are no longer than 24, 16 or 8 tokens, respectively. ... All the Bi LSTMs in our model are 1-layered with 512 hidden states. The Adam optimizer (Kingma and Ba 2014) is used with the base learning rate of 1e-3, further decreasing to 1e-5 with a warm-up process.