Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Mitigating Spurious Features in Contrastive Learning with Spectral Regularization

Authors: Naghmeh Ghanooni, Waleed Mustafa, Dennis Wagner, Sophie Fellenz, Anthony Lin, Marius Kloft

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results on Sim CLR and Sim Siam demonstrate consistent gains in robustness and transfer performance, suggesting broad applicability across self-supervised learning paradigms. Code: Git Hub repository. [...] Experiments across five challenging spurious-correlation benchmarks show that the method substantially improves performance on the most difficult portion of the dataset worst-group accuracy while also increasing average accuracy, and achieves new state-of-the-art results on multiple downstream tasks.
Researcher Affiliation	Academia	Naghmeh Ghanooni Department of Computer Science, RPTU Kaiserslautern, Germany EMAIL Waleed Mustafa Department of Computer Science, RPTU Kaiserslautern, Germany EMAIL Dennis Wagner Department of Computer Science, RPTU Kaiserslautern, Germany EMAIL Anthony Widjaja Lin Max-Planck Institute for Software Systems Kaiserslautern, Germany EMAIL Sophie Fellenz Department of Computer Science, RPTU Kaiserslautern, Germany EMAIL Marius Kloft Department of Computer Science, RPTU Kaiserslautern, Germany EMAIL
Pseudocode	Yes	Pseudocode for computing the regularization term is provided in Appendix E. [...] Algorithm 1 Self-supervised Contrastive Pretraining with Spectrum Regularization [...] Algorithm 2 Spectrum Flattening Loss Computation (Lspec)
Open Source Code	Yes	Code: Git Hub repository.
Open Datasets	Yes	We evaluate all methods on five widely used vision benchmarks designed to study spurious correlations. Among them, Spur CIFAR-10 [Nagarajan et al., 2020] and C-MNIST [Arjovsky et al., 2019] are synthetic datasets... In Celeb A [Liu et al., 2015], gender... Meta Shift [Liang and Zou, 2022] explores... Finally, Waterbirds [Sagawa et al., 2019] contains... All datasets used in this work are publicly available, and the code will be submitted prior to the supplementary material deadline.
Dataset Splits	Yes	To ensure a balanced training dataset, we subsample the majority groups [Sagawa et al., 2020, Idrissi et al., 2022], helping to mitigate geometric biases in the linear classifier [Nagarajan et al., 2020]. Finally, we evaluate the learned representations on the standard test split of each dataset, leveraging group information to report both average accuracy and worst-group accuracy.
Hardware Specification	Yes	All experiments using the spectral flattening regularizer were run on a single NVIDIA A100 GPU with 40 GB memory.
Software Dependencies	No	The paper does not specify any particular software libraries or frameworks with their version numbers (e.g., Python, PyTorch, TensorFlow versions) used for implementing the methodology.
Experiment Setup	Yes	Table 4: Hyperparameter settings and encoder architectures for Sim CLR pretraining. Dataset Encoder Learning Rate Batch Size Weight Decay Epochs Regularizer α [...] Detailed hyperparameter configurations for Sim CLR across all datasets are provided in Table 4. To select the regularization strength α for the spectral flattening loss, we performed a grid search over the values {0.001, 0.005, 0.01, 0.05} using validation performance on the worst-group accuracy as the selection criterion.