Connective Cognition Network for Directional Visual Commonsense Reasoning
Authors: Aming Wu, Linchao Zhu, Yahong Han, Yi Yang
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on the VCR dataset demonstrate the effectiveness of our method. Particularly, in Q AR mode, our method is around 4% higher than the state-of-the-art method. |
| Researcher Affiliation | Academia | Aming Wu1 Linchao Zhu2 Yahong Han1 Yi Yang2 1College of Intelligence and Computing, Tianjin University, Tianjin, China 2Re LER, University of Technology Sydney, Australia |
| Pseudocode | No | The paper describes its methods through text and mathematical equations but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available at https://github.com/AmingWu/CCN. |
| Open Datasets | Yes | We evaluate our method on the VCR dataset. And this dataset contains 290k pairs of questions, answers, and rationales, over 110k unique movie scenes. ... [42] Rowan Zellers, Yonatan Bisk, Ali Farhadi, and Yejin Choi. From recognition to cognition: Visual commonsense reasoning. In CVPR, 2019. |
| Dataset Splits | Yes | In this section, based on the validation set, we make ablation analysis for our proposed conditional Graph VLAD... The results are shown in Table 1. ... Q A QA R Q AR Model Val Test Val Test Val Test |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions software components like Res Net50, BERT, LSTM, and Adam optimizer, but it does not specify their version numbers. |
| Experiment Setup | Yes | Implementation details. ... The size of the hidden state of LSTM is set to 512. For Eq. (1), we use a one-layer GCN. And 32 centers are used to compute Graph VLAD. For Eq. (2), we separately use a two-layer network to define f and h. Their parameters are all set to 1 1024 512 and 1 512 512. Next, we use a one-layer GCN to capture relations between centers. And the parameter settings of the GCN are the same as those of Eq. (1). For contextualized connectivity, we separately use a one-layer GCN to process query and response. Their parameter settings are the same as those of Eq. (1). For Eq. (5), a one-layer GCN is used for reasoning. Besides, the parameters of the network φ are set to 1 1024 512. During training, we use Adam optimizer with a learning rate of 2 10 3. |