Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Mitigating Spurious Correlations in Zero-Shot Multimodal Models
Authors: Shenyu Lu, Junyi Chai, Xiaoqian Wang
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conducted experiments on benchmark datasets, which have shown significant improvements in worst-group accuracy. Additionally, our visualizations of VLMs further demonstrate the effectiveness of this intervention. |
| Researcher Affiliation | Academia | Shenyu Lu, Junyi Chai & Xiaoqian Wang Elmore Family School of Electrical and Computer Engineering Purdue University West Lafayette, IN 47906, USA EMAIL |
| Pseudocode | Yes | We summarize our method in Algorithm 1. |
| Open Source Code | Yes | 1Code at https://github.com/lu876/TIE |
| Open Datasets | Yes | Datasets. We study five well-established benchmark datasets for spurious correlation research: Waterbirds (Koh et al., 2021; Sagawa et al., 2019), Celeb A (Liu et al., 2015), ISIC (Codella et al., 2019), COVID-19 (Cohen et al., 2020), FMOW (Christie et al., 2018). |
| Dataset Splits | Yes | Following the protocol established by robust learning studies (Sagawa et al., 2019; Adila et al., 2024), we report three metrics: worst group accuracy (WG), average accuracy (Avg), and the gap between these two metrics (Gap). |
| Hardware Specification | Yes | We conducted all experiments on an Nvidia RTX 3090 GPU with 24 GB of memory, using frozen CLIP models across various datasets. |
| Software Dependencies | No | The paper mentions "Model construction and pre-trained weights are sourced from Open CLIP (Ilharco et al., 2021)" and "We utilize GPT-4 (Open AI, 2023)" but does not provide specific version numbers for these or other key software libraries like PyTorch, numpy, or scikit-learn that would be necessary for reproduction. |
| Experiment Setup | Yes | The model was trained using an SGD optimizer with a learning rate of 10 4, a weight decay of 10 3, and a momentum of 0.9, over 200 epochs. |