Towards Robust Multi-Modal Reasoning via Model Selection

Authors: Xiangyan Liu, Rongxue LI, Wei Ji, Tao Lin

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In the absence of suitable benchmarks, we create MS-GQA, a new dataset specifically designed to investigate the model selection challenge in multi-modal agents. Our experiments reveal that our framework enables dynamic model selection, considering both user inputs and subtask dependencies, thereby robustifying the overall reasoning process. Our code and benchmark: https://github.com/LINs-lab/M3. 4 EXPERIMENTS
Researcher Affiliation Academia Xiangyan Liu3, Rongxue Li2,1, Wei Ji3 Tao Lin1, liu.xiangyan@u.nus.edu; lirongxue@westlake.edu.cn; weiji0523@gmail.com; lintao@westlake.edu.cn 1Westlake University 2Zhejiang University 3National University of Singapore
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Our code and benchmark: https://github.com/LINs-lab/M3.
Open Datasets Yes As our side contribution, we introduce the first benchmark, MSGQA (Model Selection in GQA (Hudson & Manning, 2019)), to explore the model selection methods on multi-modal reasoning scenarios. Our code and benchmark: https://github.com/LINs-lab/M3.
Dataset Splits Yes The dataset from MS-GQA is split randomly into training, validation, and test sets, with a 6 : 2 : 2 ratio.
Hardware Specification No The paper does not explicitly describe the specific hardware used to run its experiments, such as GPU models, CPU models, or memory specifications.
Software Dependencies No The paper mentions software components and models like 'CCE loss', 'NCF', 'METAGL', 'GAT', 'blip-base-vqa', and 'bert-base-uncased', but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes Specifically, we explored hidden sizes [16, 32, 64, 128], learning rates [1e-2, 5e-3, 1e-3, 5e-3, 1e-4], weight decays [0.01, 0.001, 0.0001], and optimizer options [Adam W, Adam, SGD]. A batch size of 64 is utilized, along with Step LR Scheduler with parameters step size 100 and gamma 0.7. The learning rate is adjusted within [1e-2, 5e-3, 1e-3, 5e-3, 1e-4], with a majority of the experiments using 1e-3. The weight decay is set to 0, and the batch size is set to 128.