Free VQA Models from Knowledge Inertia by Pairwise Inconformity Learning
Authors: Yiyi Zhou, Rongrong Ji, Jinsong Su, Xiangming Li, Xiaoshuai Sun9316-9323
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To verify the proposed PIL, we plug it on a baseline VQA model as well as a set of recent VQA models, and conduct extensive experiments on two benchmark datasets, i.e., VQA1.0 and VQA2.0. |
| Researcher Affiliation | Academia | 1Fujian Key Laboratory of Sensing and Computing for Smart City, Department of Cognitive Science, School of Information Science and Engineering, Xiamen University, China 2School of Software Engineering, Xiamen University, China 3Peng Cheng Laboratory, China |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | Yes | 2https://github.com/xiangming Li/PIL |
| Open Datasets | Yes | VQA1.0 dataset contains 200,000 natural images from MS-COCO (Chen et al. 2015) with 614,153 human annotated questions in total. [...] VQA2.0 is developed based on VQA1.0, and has about 1,105,904 image-question pairs, of which 443,757 examples are for training, 214,254 for validation, and 447,793 for testing. |
| Dataset Splits | Yes | The whole dataset is divided into three splits, in which there are 248,349 examples for training, 121,512 for validation, and 244,302 for testing. [...] VQA2.0 is developed based on VQA1.0, and has about 1,105,904 image-question pairs, of which 443,757 examples are for training, 214,254 for validation, and 447,793 for testing. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processors, or memory used for running the experiments. |
| Software Dependencies | No | The paper mentions software components like 'Glove Embedding' and 'Adam optimizer' and 'LSTM network' but does not provide specific version numbers for any libraries or frameworks used. |
| Experiment Setup | Yes | The dimension of the LSTM module is 2048, while the k and o in MFB fusion (Yu et al. 2017) are set to 5 and 1000, respectively. The dimensions of the last forward layer and the projections are set to 2048 and 300. The two hyper-parameters, α and β, are set to 0.25 and 0.01 after tuning. The initial learning rate is 7e-4, which is halved after every 25,000 steps. The batch size is 64 and the maximum training step is 150,000. The optimizer we used is Adam (Kingma and Ba 2014). |