Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
ERICT: Enhancing Robustness by Identifying Concept Tokens in Zero-Shot Vision Language Models
Authors: Xinpeng Dong, Min Zhang, Didi Zhu, Ye Jun Jian, Zhang Keli, Aimin Zhou, Fei Wu, Kun Kuang
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that ERICT significantly improves the overall performance, including that of the worst group, and achieves new state-of-the-art results. (Section: Abstract) 6. Experiments |
| Researcher Affiliation | Collaboration | 1Department of Computer Science and Technology, Zhejiang University, Hangzhou, China 2East China Normal University 3Huawei Noah s Ark Lab. |
| Pseudocode | Yes | C. Pseudocode Algorithm 1: Step 1 of ERICT-C Algorithm 2: Step 2 of ERICT-C |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that source code for the described methodology is publicly available or provided in supplementary materials. |
| Open Datasets | Yes | We evaluate our approach on three widely used spurious correlation datasets, including Waterbirds (Sagawa et al., 2019), Celeb A (Liu et al., 2015), and Urbancars (Li et al., 2023b). Imagenet (Deng et al., 2009) is a widely used large-scale vision dataset containing more than 14 million images covering 1,000 categories. |
| Dataset Splits | Yes | For Waterbirds and Celeb A, we follow the setting of previous works (Sarridis et al., 2024; Yang et al., 2024; You et al., 2024). |
| Hardware Specification | Yes | All of our experiments are conducted on a single NVIDIA Ge Force RTX 4090 GPU. |
| Software Dependencies | No | All the images were generated using the default t-SNE parameters from the scikit-learn package. This mention of "scikit-learn package" does not include a specific version number, and no other software dependencies with version numbers are provided. |
| Experiment Setup | Yes | The temperature parameter controls the sharpness of the similarity score matrix distribution, thereby influencing the mask ratio during the inference phase. For ERICT, we use an auxiliary prompt xa t for every task and get the auxiliary text feature by the text encoder. For ERICT-C, whose auxiliary embedding can be obtained through aggregating class prompt embeddings. When the dataset contains a large number of class (e.g., Image Net), ERICT-C adopt a top-K strategy. |