reproducibilityindex.ai

What Factors Affect Multi-Modal In-Context Learning? An In-Depth Exploration

Authors: Libo Qin, Qiguang Chen, Hao Fei, Zhi Chen, Min Li, Wanxiang Che

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To this end, we investigate extensive experiments on the three core steps of MM-ICL including demonstration retrieval, demonstration ordering, and prompt construction using 6 vision large language models and 20 strategies.
Researcher Affiliation	Collaboration	School of Computer Science and Engineering, Central South University Research Center for Social Computing and Information Retrieval Harbin Institute of Technology Tsinghua University Byte Dance
Pseudocode	No	The paper does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The code for exploratory prompt work generally does not need to be released, and readers can easily use the prompts we report to directly reproduce the results.
Open Datasets	Yes	Following the setting of Li et al. [2023c], we systematically explore 4 tasks, including image-caption, visual question answering (VQA), image classification, and chain-of-thought reasoning, which come from M3IT [Li et al., 2023c] and M3Co T [Chen et al., 2024b] (as shown in Tables 2)... Table 2: Dataset in M3IT and M3Co T, where IC: Image Captioning, CLS: Classification, VQA: Visual Question Answering, R: Chain-of-Thought Reasoning (with NL rationale). Due to the cost, for each task, we evenly sampled 500 items according to the sub-dataset.
Dataset Splits	No	While they mention using a "validation dataset" for demonstration retrieval and sample 500 items for evaluation, the paper does not explicitly provide percentages or counts for training/validation/test splits of the overall datasets used for their experiments, nor does it cite predefined splits for these specific experiments.
Hardware Specification	Yes	In addition, all open source models complete inference on 2 A100 80G.
Software Dependencies	No	The paper mentions specific models and encoders (e.g., RoBERTa, CLIP-Vision Encoder, Bridge Tower) but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	This baseline ranks samples based on similarity, with a delimiter and a 3-shot setting (see Appendix A for details). In addition, all open source models complete inference on 2 A100 80G. For all experiments, we select top-p from {0.95, 1} and adjust the temperature parameter within [0, 1]. Among them, temperature is the main error variable in this work.