Towards Robust Multi-Modal Reasoning via Model Selection
Authors: Xiangyan Liu, Rongxue LI, Wei Ji, Tao Lin
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In the absence of suitable benchmarks, we create MS-GQA, a new dataset specifically designed to investigate the model selection challenge in multi-modal agents. Our experiments reveal that our framework enables dynamic model selection, considering both user inputs and subtask dependencies, thereby robustifying the overall reasoning process. Our code and benchmark: https://github.com/LINs-lab/M3. 4 EXPERIMENTS |
| Researcher Affiliation | Academia | Xiangyan Liu3, Rongxue Li2,1, Wei Ji3 Tao Lin1, liu.xiangyan@u.nus.edu; lirongxue@westlake.edu.cn; weiji0523@gmail.com; lintao@westlake.edu.cn 1Westlake University 2Zhejiang University 3National University of Singapore |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code and benchmark: https://github.com/LINs-lab/M3. |
| Open Datasets | Yes | As our side contribution, we introduce the first benchmark, MSGQA (Model Selection in GQA (Hudson & Manning, 2019)), to explore the model selection methods on multi-modal reasoning scenarios. Our code and benchmark: https://github.com/LINs-lab/M3. |
| Dataset Splits | Yes | The dataset from MS-GQA is split randomly into training, validation, and test sets, with a 6 : 2 : 2 ratio. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware used to run its experiments, such as GPU models, CPU models, or memory specifications. |
| Software Dependencies | No | The paper mentions software components and models like 'CCE loss', 'NCF', 'METAGL', 'GAT', 'blip-base-vqa', and 'bert-base-uncased', but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | Specifically, we explored hidden sizes [16, 32, 64, 128], learning rates [1e-2, 5e-3, 1e-3, 5e-3, 1e-4], weight decays [0.01, 0.001, 0.0001], and optimizer options [Adam W, Adam, SGD]. A batch size of 64 is utilized, along with Step LR Scheduler with parameters step size 100 and gamma 0.7. The learning rate is adjusted within [1e-2, 5e-3, 1e-3, 5e-3, 1e-4], with a majority of the experiments using 1e-3. The weight decay is set to 0, and the batch size is set to 128. |