Why does Throwing Away Data Improve Worst-Group Error?

Authors: Kamalika Chaudhuri, Kartik Ahuja, Martin Arjovsky, David Lopez-Paz

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct separate analyses to understand the case of imbalanced classes (Section 4) and groups (Section 5). In particular, Subsampling outperforms ERM in worst-group-error when learning from imbalanced classes with tails (such as Gaussians, Theorem 3), while it makes no difference when learning from distributions without tails (such as Uniforms, Theorem 4). Similar results follow for balanced classes but imbalanced groups (Theorem 6 for groups with tails, Theorem 7 for groups without tails). We extend these results to the high-dimensional case where there exist a multitude of noise dimensions polluting the data (Theorems 5, 8, 9) and provide empirical support for our theories using Waterbirds and Celeb A, the two most common datasets to benchmark worst-group-error (Section 6).
Researcher Affiliation Collaboration 1FAIR (Meta AI) 2UC San Diego 3Inria.
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes We train in Pytorch using the same environment from (Idrissi et al., 2021) provided at https://github.com/facebookresearch/Balancing Groups.
Open Datasets Yes These questions are considered in the context of Waterbirds (Sagawa et al., 2019) and Celeb A (Liu et al., 2015), the two most commonly used datasets for studying group imbalance.
Dataset Splits No The paper mentions total data points for Waterbirds (4795) and Celeb A (162770) but does not specify training, validation, or test split percentages or counts. It only refers to 'fine-tuned on Waterbirds ... and Celeb A datasets'.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper mentions "We train in Pytorch" but does not specify the version number of Pytorch or any other software dependencies with their versions.
Experiment Setup Yes We use Adam optimizer with a learning rate of 10^-4 and a weight decay of 10^-3 and train for 10 epochs with a batch size of 128. Linear Layer Learning: We use Adam optimizer with a learning rate of 10^-2 and train for 100 epochs with a batch size of 128.