reproducibilityindex.ai

Towards Robust Multi-Modal Reasoning via Model Selection

Authors: Xiangyan Liu, Rongxue LI, Wei Ji, Tao Lin

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In the absence of suitable benchmarks, we create MS-GQA, a new dataset specifically designed to investigate the model selection challenge in multi-modal agents. Our experiments reveal that our framework enables dynamic model selection, considering both user inputs and subtask dependencies, thereby robustifying the overall reasoning process. Our code and benchmark: https://github.com/LINs-lab/M3. 4 EXPERIMENTS
Researcher Affiliation	Academia	Xiangyan Liu3, Rongxue Li2,1, Wei Ji3 Tao Lin1, liu.xiangyan@u.nus.edu; lirongxue@westlake.edu.cn; weiji0523@gmail.com; lintao@westlake.edu.cn 1Westlake University 2Zhejiang University 3National University of Singapore
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our code and benchmark: https://github.com/LINs-lab/M3.
Open Datasets	Yes	As our side contribution, we introduce the first benchmark, MSGQA (Model Selection in GQA (Hudson & Manning, 2019)), to explore the model selection methods on multi-modal reasoning scenarios. Our code and benchmark: https://github.com/LINs-lab/M3.
Dataset Splits	Yes	The dataset from MS-GQA is split randomly into training, validation, and test sets, with a 6 : 2 : 2 ratio.
Hardware Specification	No	The paper does not explicitly describe the specific hardware used to run its experiments, such as GPU models, CPU models, or memory specifications.
Software Dependencies	No	The paper mentions software components and models like 'CCE loss', 'NCF', 'METAGL', 'GAT', 'blip-base-vqa', and 'bert-base-uncased', but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	Specifically, we explored hidden sizes [16, 32, 64, 128], learning rates [1e-2, 5e-3, 1e-3, 5e-3, 1e-4], weight decays [0.01, 0.001, 0.0001], and optimizer options [Adam W, Adam, SGD]. A batch size of 64 is utilized, along with Step LR Scheduler with parameters step size 100 and gamma 0.7. The learning rate is adjusted within [1e-2, 5e-3, 1e-3, 5e-3, 1e-4], with a majority of the experiments using 1e-3. The weight decay is set to 0, and the batch size is set to 128.