reproducibilityindex.ai

Simple and Fast Group Robustness by Automatic Feature Reweighting

Authors: Shikai Qiu, Andres Potapczynski, Pavel Izmailov, Andrew Gordon Wilson

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate AFR on a range of benchmarks and provide detailed ablations on design decisions and hyperparameters. ... We report the results for AFR and the baselines in Table 1, showing that AFR outperforms or that it is competitive with the best reported results among methods trained without access to the spurious attributes on all datasets except Celeb A.
Researcher Affiliation	Academia	1New York University. Correspondence to: Shikai Qiu <sq2129@nyu.edu>, Andres Potapczynski <ap6604@nyu.edu>, Pavel Izmailov <pi390@nyu.edu>, Andrew Gordon Wilson <andrewgw@cims.nyu.edu>.
Pseudocode	Yes	Algorithm 1 Automatic Feature Reweighting
Open Source Code	Yes	Code for AFR is available at https://github.com/And Potap/afr.
Open Datasets	Yes	We consider several image and text classification problems. For more details, see Appendix B. ... Waterbirds (Sagawa et al., 2020), Celeb A (Liu et al., 2015), Multi NLI (Williams et al., 2017), Civil Comments (Borkan et al., 2019), CXR (Yang et al., 2022).
Dataset Splits	Yes	We find that splitting the training set in a 80% 20% proportion to construct DERM and DRW works well in practice, but we show that performance is not particularly sensitive to the split in Appendix C.6. ... Following all other group robustness methods, we use validation WGA to tune AFR s γ and λ and perform early stopping (see Appendix B for details).
Hardware Specification	Yes	Timings for AFR and JTT in Figure 3 are obtained by running on a single RTX8000 (48 GB) NVIDIA GPU.
Software Dependencies	No	The paper mentions software components like "torchvision.models.resnet50" (implying PyTorch) and "BERT (using Hugging Face)" but does not provide specific version numbers for these libraries or frameworks, which are necessary for reproducible software dependencies.
Experiment Setup	Yes	AFR 1st stage: epochs = 50, optimizer=sgd, scheduler=cosine, batch size = 32, learning rate = 3e-3, weight decay = 1e-4. AFR 2nd stage: epochs = 500, γ from 33 points linearly spaced between [4, 20], learning rate in = 1e-2, λ {0, 0.1, 0.2, 0.3, 0.4}.