Connective Cognition Network for Directional Visual Commonsense Reasoning

Authors: Aming Wu, Linchao Zhu, Yahong Han, Yi Yang

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on the VCR dataset demonstrate the effectiveness of our method. Particularly, in Q AR mode, our method is around 4% higher than the state-of-the-art method.
Researcher Affiliation Academia Aming Wu1 Linchao Zhu2 Yahong Han1 Yi Yang2 1College of Intelligence and Computing, Tianjin University, Tianjin, China 2Re LER, University of Technology Sydney, Australia
Pseudocode No The paper describes its methods through text and mathematical equations but does not include any explicit pseudocode or algorithm blocks.
Open Source Code Yes The code is available at https://github.com/AmingWu/CCN.
Open Datasets Yes We evaluate our method on the VCR dataset. And this dataset contains 290k pairs of questions, answers, and rationales, over 110k unique movie scenes. ... [42] Rowan Zellers, Yonatan Bisk, Ali Farhadi, and Yejin Choi. From recognition to cognition: Visual commonsense reasoning. In CVPR, 2019.
Dataset Splits Yes In this section, based on the validation set, we make ablation analysis for our proposed conditional Graph VLAD... The results are shown in Table 1. ... Q A QA R Q AR Model Val Test Val Test Val Test
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments, such as GPU or CPU models.
Software Dependencies No The paper mentions software components like Res Net50, BERT, LSTM, and Adam optimizer, but it does not specify their version numbers.
Experiment Setup Yes Implementation details. ... The size of the hidden state of LSTM is set to 512. For Eq. (1), we use a one-layer GCN. And 32 centers are used to compute Graph VLAD. For Eq. (2), we separately use a two-layer network to define f and h. Their parameters are all set to 1 1024 512 and 1 512 512. Next, we use a one-layer GCN to capture relations between centers. And the parameter settings of the GCN are the same as those of Eq. (1). For contextualized connectivity, we separately use a one-layer GCN to process query and response. Their parameter settings are the same as those of Eq. (1). For Eq. (5), a one-layer GCN is used for reasoning. Besides, the parameters of the network φ are set to 1 1024 512. During training, we use Adam optimizer with a learning rate of 2 10 3.