Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

ERICT: Enhancing Robustness by Identifying Concept Tokens in Zero-Shot Vision Language Models

Authors: Xinpeng Dong, Min Zhang, Didi Zhu, Ye Jun Jian, Zhang Keli, Aimin Zhou, Fei Wu, Kun Kuang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that ERICT significantly improves the overall performance, including that of the worst group, and achieves new state-of-the-art results. (Section: Abstract) 6. Experiments
Researcher Affiliation	Collaboration	1Department of Computer Science and Technology, Zhejiang University, Hangzhou, China 2East China Normal University 3Huawei Noah s Ark Lab.
Pseudocode	Yes	C. Pseudocode Algorithm 1: Step 1 of ERICT-C Algorithm 2: Step 2 of ERICT-C
Open Source Code	No	The paper does not contain any explicit statements or links indicating that source code for the described methodology is publicly available or provided in supplementary materials.
Open Datasets	Yes	We evaluate our approach on three widely used spurious correlation datasets, including Waterbirds (Sagawa et al., 2019), Celeb A (Liu et al., 2015), and Urbancars (Li et al., 2023b). Imagenet (Deng et al., 2009) is a widely used large-scale vision dataset containing more than 14 million images covering 1,000 categories.
Dataset Splits	Yes	For Waterbirds and Celeb A, we follow the setting of previous works (Sarridis et al., 2024; Yang et al., 2024; You et al., 2024).
Hardware Specification	Yes	All of our experiments are conducted on a single NVIDIA Ge Force RTX 4090 GPU.
Software Dependencies	No	All the images were generated using the default t-SNE parameters from the scikit-learn package. This mention of "scikit-learn package" does not include a specific version number, and no other software dependencies with version numbers are provided.
Experiment Setup	Yes	The temperature parameter controls the sharpness of the similarity score matrix distribution, thereby influencing the mask ratio during the inference phase. For ERICT, we use an auxiliary prompt xa t for every task and get the auxiliary text feature by the text encoder. For ERICT-C, whose auxiliary embedding can be obtained through aggregating class prompt embeddings. When the dataset contains a large number of class (e.g., Image Net), ERICT-C adopt a top-K strategy.