Exploiting the Social-Like Prior in Transformer for Visual Reasoning
Authors: Yudong Han, Yupeng Hu, Xuemeng Song, Haoyu Tang, Mingzhu Xu, Liqiang Nie
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our model outperforms a bunch of baselines by a noticeable margin when considering our social-like prior on five benchmarks in VQA and REC tasks, and a series of explanatory results are showcased to sufficiently reveal the social-like behaviors in SA. |
| Researcher Affiliation | Academia | 1School of Software, Shandong University 2School of Computer Science and Technology, Shandong University 3School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen) {hanyudong.sdu, sxmustc, nieliqiang}@gmail.com, {huyupeng, tanghao258, xumingzhu}@sdu.edu.cn |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete access to source code (e.g., repository link, explicit statement of code release) for the methodology described. |
| Open Datasets | Yes | Datasets. VQA 2.0 is the most commonly used benchmark dataset for VQA, which is developed based on VQA 1.0. The images stems from Microsoft COCO (Lin et al. 2014). The overall dataset has about 1000K examples, which are splited into train, val and test, respectively. CLEVR is a synthetic diagnostic dataset... Ref COCO, Ref COCO+, and Ref COCOg are three commonly used benchmarks for REC. |
| Dataset Splits | Yes | VQA 2.0 is the most commonly used benchmark dataset for VQA... The overall dataset has about 1000K examples, which are splited into train, val and test, respectively. CLEVR... 70K/ 15K images and 700K/ 150K questions in the train/val set... Ref COCO... which is split into train, val, test A, and test B set. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types) used for running its experiments. |
| Software Dependencies | No | The paper mentions software components like 'GLOVE embeddings', 'LSTM', 'Res Next152', and 'BERT model' but does not specify their version numbers for reproducibility. |
| Experiment Setup | Yes | In VQA task, the model configuration for VQA 2.0 and CLEVR are similar... The numbers of training epochs for VQA 2.0 and CLEVR are set to 13 and 16, respectively, and warming-up strategy is adopted in the first three epochs. The learning rates is initialized by 1e-4, which are decayed by 0.2 on the 10-th, 13-th and 15-th epochs. The batch size is set to 64. In REC task... our model is trained for 90 epochs with a initial 1e-4 learning rate dropped by a factor of 10 after 60 epochs, except Ref COCOg with training for 60 epochs and dropping after 40 epochs, and we set the batch size to 16. |