Robust Learning with Progressive Data Expansion Against Spurious Correlation

Authors: Yihe Deng, Yu Yang, Baharan Mirzasoleiman, Quanquan Gu

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on synthetic and real-world benchmark datasets confirm the superior performance of our method on models such as Res Nets and Transformers.
Researcher Affiliation Academia Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095
Pseudocode Yes Algorithm 1 Progressive Data Expansion (PDE)
Open Source Code Yes Codes are available at https://github.com/uclaml/PDE.
Open Datasets Yes We evaluate on three wildly used datasets across vision and language tasks for spurious correlation: (1) Waterbirds (Sagawa et al., 2019) contains bird images labeled as waterbird or landbird, placed against a water or land background... (2) Celeb A (Liu et al., 2015) is used to study gender as the spurious feature for hair color classification... (3) Civil Comments WILDS (Koh et al., 2021b) classifies toxic and non-toxic online comments while dealing with demographic information.
Dataset Splits Yes All methods use validation data for early stopping and model selection
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions software like PyTorch and WILDS library through citations, but it does not specify the exact version numbers of these software components or any other key libraries used in their experimental setup.
Experiment Setup Yes Require: Number of iterations T0 for warm-up training; number of times K for dataset expansion; number of iterations J for expansion training; number of data m for each expansion; learning rate η; momentum coefficient γ; initialization scale σ0; training set S = {(xi, yi, ai)}n i=1; model f W. and Table 5: Ablation study on Waterbirds. Exp. size: number of data points added in each expansion. Exp. lr: the learning rate in expansion stage. and Table 6: Ablation study on Waterbirds. Exp. lr: the learning rate in expansion stage.