Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Provable Domain Generalization via Invariant-Feature Subspace Recovery
Authors: Haoxiang Wang, Haozhe Si, Bo Li, Han Zhao
ICML 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, our ISRs can obtain superior performance compared with IRM on synthetic benchmarks. In addition, on three real-world image and text datasets, we show that both ISRs can be used as simple yet effective post-processing methods to improve the worst-case accuracy of (pre-)trained models against spurious correlations and group shifts. ... We conduct experiments on both synthetic and real datasets to examine our proposed algorithms. |
| Researcher Affiliation | Academia | Haoxiang Wang 1 Haozhe Si 1 Bo Li 1 Han Zhao 1 1University of Illinois at Urbana-Champaign, Urbana, IL, USA. Correspondence to: Haoxiang Wang <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 ISR-Mean Algorithm 2 ISR-Cov |
| Open Source Code | Yes | The code is released at https: //github.com/Haoxiang-Wang/ISR. |
| Open Datasets | Yes | Waterbirds (Sagawa et al., 2019): This is a image dataset built from the CUB (Wah et al., 2011) and Places (Zhou et al., 2017) datasets. ... Celeb A (Liu et al., 2015): This is a celebrity face dataset... ... Multi NLI (Williams et al., 2017): This is a text dataset for natural language inference. |
| Dataset Splits | Yes | We choose the hyper-parameters that minimize the mean error over the validation split of all environments. ... early stop models at the epoch with the highest worst-group validation accuracy. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions using "Adam optimizer" and "logistic regression solver provided in scikit-learn", but does not specify version numbers for these software components or any other libraries. |
| Experiment Setup | Yes | for 10K full-batch Adam ... iterations. ... We choose the hyper-parameters that minimize the mean error over the validation split of all environments. ... early stop models at the epoch with the highest worst-group validation accuracy. ... Adopting the same hyperparameter as that of Table 1 |