reproducibilityindex.ai

Robust Learning with Progressive Data Expansion Against Spurious Correlation

Authors: Yihe Deng, Yu Yang, Baharan Mirzasoleiman, Quanquan Gu

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on synthetic and real-world benchmark datasets confirm the superior performance of our method on models such as Res Nets and Transformers.
Researcher Affiliation	Academia	Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095
Pseudocode	Yes	Algorithm 1 Progressive Data Expansion (PDE)
Open Source Code	Yes	Codes are available at https://github.com/uclaml/PDE.
Open Datasets	Yes	We evaluate on three wildly used datasets across vision and language tasks for spurious correlation: (1) Waterbirds (Sagawa et al., 2019) contains bird images labeled as waterbird or landbird, placed against a water or land background... (2) Celeb A (Liu et al., 2015) is used to study gender as the spurious feature for hair color classification... (3) Civil Comments WILDS (Koh et al., 2021b) classifies toxic and non-toxic online comments while dealing with demographic information.
Dataset Splits	Yes	All methods use validation data for early stopping and model selection
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	The paper mentions software like PyTorch and WILDS library through citations, but it does not specify the exact version numbers of these software components or any other key libraries used in their experimental setup.
Experiment Setup	Yes	Require: Number of iterations T0 for warm-up training; number of times K for dataset expansion; number of iterations J for expansion training; number of data m for each expansion; learning rate η; momentum coefficient γ; initialization scale σ0; training set S = {(xi, yi, ai)}n i=1; model f W. and Table 5: Ablation study on Waterbirds. Exp. size: number of data points added in each expansion. Exp. lr: the learning rate in expansion stage. and Table 6: Ablation study on Waterbirds. Exp. lr: the learning rate in expansion stage.