Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Dynamic Language Binding in Relational Visual Reasoning
Authors: Thao Minh Le, Vuong Le, Svetha Venkatesh, Truyen Tran
IJCAI 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The effectiveness of this model is demonstrated on image question answering demonstrating favorable performance on major VQA datasets. We apply our model on major VQA datasets. Both qualitative and quantitative results indicate that LOGNet has advantages over state-of-the-art methods in answering long and complex questions. Our results show superior performance even when trained on just 10% of data. We evaluate our model on multiple datasets including: CLEVR, CLEVR-Human, GQA, VQA v2. We conduct ablation studies with our model on CLEVR subset of 10% training data (See Table 4). |
| Researcher Affiliation | Academia | Thao Minh Le , Vuong Le , Svetha Venkatesh and Truyen Tran Applied Arti๏ฌcial Intelligence Institute, Deakin University, Australia EMAIL |
| Pseudocode | No | The paper describes the model in detail but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code. |
| Open Datasets | Yes | We evaluate our model on multiple datasets including: CLEVR [Johnson et al., 2017a]: presents several reasoning tasks such as transitive relations and attribute comparison. CLEVR-Human [Johnson et al., 2017b]: composes natural language question-answer pairs on images from CLEVR. GQA [Hudson and Manning, 2019b]: the current largest visual relational reasoning dataset providing semantic scene graphs coupled with images. VQA v2 [Goyal et al., 2017]: As a large portion of questions is short and can be answered by looking for facts in images, we design experiments with a split of only long questions (>7 words). |
| Dataset Splits | No | The paper mentions 'Val. Acc. (%)' in tables, indicating a validation set was used, but does not explicitly provide exact split percentages or sample counts for training, validation, and test sets. It mentions '10% of training data' or '20% and 50% splits' which refer to the size of the training subset used, not a complete train/validation/test partition. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU, CPU models, memory) used for running experiments. |
| Software Dependencies | No | The paper mentions software components like Faster R-CNN, bi LSTM, GCN, and Res Net, and pretrained GloVe vectors, but it does not specify version numbers for these or any other ancillary software dependencies. |
| Experiment Setup | Yes | Our model is generally implemented with feature dimension d = 512, reasoning depth T = 8, GCN depth H = 8 and attention-width K = 2. The number of regions is N = 14 for CLEVR and CLEVR-Human, and 100 for GQA and 36 for VQA v2 to match with other related methods. We also match the word embeddings with others by using random vectors of a uniform distribution for CLEVR/CLEVR-Human and pretrained Glo Ve vectors for the other datasets. |