Robust Learning with Progressive Data Expansion Against Spurious Correlation
Authors: Yihe Deng, Yu Yang, Baharan Mirzasoleiman, Quanquan Gu
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on synthetic and real-world benchmark datasets confirm the superior performance of our method on models such as Res Nets and Transformers. |
| Researcher Affiliation | Academia | Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095 |
| Pseudocode | Yes | Algorithm 1 Progressive Data Expansion (PDE) |
| Open Source Code | Yes | Codes are available at https://github.com/uclaml/PDE. |
| Open Datasets | Yes | We evaluate on three wildly used datasets across vision and language tasks for spurious correlation: (1) Waterbirds (Sagawa et al., 2019) contains bird images labeled as waterbird or landbird, placed against a water or land background... (2) Celeb A (Liu et al., 2015) is used to study gender as the spurious feature for hair color classification... (3) Civil Comments WILDS (Koh et al., 2021b) classifies toxic and non-toxic online comments while dealing with demographic information. |
| Dataset Splits | Yes | All methods use validation data for early stopping and model selection |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions software like PyTorch and WILDS library through citations, but it does not specify the exact version numbers of these software components or any other key libraries used in their experimental setup. |
| Experiment Setup | Yes | Require: Number of iterations T0 for warm-up training; number of times K for dataset expansion; number of iterations J for expansion training; number of data m for each expansion; learning rate η; momentum coefficient γ; initialization scale σ0; training set S = {(xi, yi, ai)}n i=1; model f W. and Table 5: Ablation study on Waterbirds. Exp. size: number of data points added in each expansion. Exp. lr: the learning rate in expansion stage. and Table 6: Ablation study on Waterbirds. Exp. lr: the learning rate in expansion stage. |