Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Spread Spurious Attribute: Improving Worst-group Accuracy with Spurious Attribute Estimation
Authors: Junhyun Nam, Jaehyung Kim, Jaeho Lee, Jinwoo Shin
ICLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on various benchmark datasets show that our algorithm consistently outperforms the baseline methods using the same number of group-labeled samples. |
| Researcher Affiliation | Academia | Junhyun Nam1, Jaehyung Kim1, Jaeho Lee2 , Jinwoo Shin1 1KAIST, 2POSTECH EMAIL EMAIL |
| Pseudocode | Yes | Algorithm 1 Spread Spurious Attribute |
| Open Source Code | Yes | Also, we provide our source code as a part of the open-to-public supplementary materials. |
| Open Datasets | Yes | Waterbirds (Sagawa et al., 2020)... Caltech-UCSD Birds dataset (Wah et al., 2011) with landscapes from Places (Zhou et al., 2017)., Celeb A (Liu ets al., 2015), Multi NLI (Williams et al., 2018), Civil Comments-WILDS (Borkan et al., 2019; Koh et al., 2021), CIFAR-10 (Krizhevsky et al., 2009). |
| Dataset Splits | Yes | For all datasets, we use the validation split of the dataset as the group-labeled set. and We use D L, D U to train the spurious attribute predictor that make prediction on D U, and validate the model with D L. |
| Hardware Specification | Yes | In Table 10, we provide the time required for the pseudo-labeling phase and the robust training phase on a single Nvidia Titan XP for each dataset. |
| Software Dependencies | No | The paper mentions software like torchvision and huggingface implementations, and optimizers like SGD and Adam W, but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | For Waterbirds and Celeb A, we tuned the learning rate over {1e3, 1e-4, 1e-5} and ℓ2 regularization over {1e-1, 1e-4}. We used SGD optimizer with momentum 0.9 and batch size 64. In pseudo-labeling phase, we train the spurious attribute predictor 1k iterations for Waterbirds and 45k iterations for Celeb A. |