Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Breaking Correlation Shift via Conditional Invariant Regularizer

Authors: Mingyang Yi, Ruoyu Wang, Jiacheng Sun, Zhenguo Li, Zhi-Ming Ma

ICLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive empirical results verify our algorithm s efficacy in improving OOD generalization. Concretely, we conduct experiments on benchmark classification datasets Celeb A (Liu et al., 2015), Waterbirds (Sagawa et al., 2019), Multi NLI (Williams et al., 2018), and Civil Comments (Borkan et al., 2019). Empirical results show that our algorithm consistently improves the model s generalization on OOD data with correlation shifts.
Researcher Affiliation	Collaboration	Mingyang Yi1,2,3, Ruoyu Wang1,2, Jiacheng Sun3, Zhenguo Li3, Zhi-Ming Ma1,2 1University of Chinese Academy of Sciences EMAIL 2Academy of Mathematics and Systems Science, Chinese Academy of Sciences EMAIL 3Huawei Noah s Ark Lab EMAIL
Pseudocode	Yes	Algorithm 1 Regularize training with CSV. Input: Training set {(xi, yi)}n i=1, number of labels Ky and spurious attributes Kz, training steps T, model fθ( ) parameterized by θ. Initialized θ0, {F k 0}. Positive regularization constant λ, surrogate constant ρ, and correction constant γ. Estimators ˆRemp(fθ, P) to Remp(fθ, P), ˆF k(θ) to F k(θ).
Open Source Code	No	The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	Concretely, we conduct experiments on benchmark classification datasets Celeb A (Liu et al., 2015), Waterbirds (Sagawa et al., 2019), Multi NLI (Williams et al., 2018), and Civil Comments (Borkan et al., 2019).
Dataset Splits	Yes	The numbers of samples in training and test dataset from the 4 groups are respectively {71629, 9767}, {66874, 7535}, {22880, 2880}, {1387, 180}. Our goal is to train a model that correctly recognizes the hair color of celebrities independent of their gender.
Hardware Specification	No	The paper mentions using "Res Net-50 pre-trained on Image Net" and "pre-trained BERT Base model" as backbone models but does not specify the hardware (e.g., GPU/CPU models, memory) used for their experiments.
Software Dependencies	No	The paper mentions optimizers like "Adam W", "Adam", and "SGD" but does not specify software dependencies with version numbers (e.g., PyTorch 1.x, Python 3.x).
Experiment Setup	Yes	The hyperparameters are in Appendix G.4. ... The hyperparameters of the proposed RCSV and RCSVU on Celeb A, Waterbirds, Multi NLI, Civil Comments, Toy example and C-MNIST respectively summarized in Table 10, 11, 12, 13, 14, and 15.