Representation Matters: Assessing the Importance of Subgroup Allocations in Training Data

Authors: Esther Rolf, Theodora T Worledge, Benjamin Recht, Michael Jordan

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4. Empirical Results Having shown the importance of training set allocations from a theoretical perspective, we now provide a complementary empirical investigation of this phenomenon. See Appendix B for full details on each experimental setup. Figure 1 highlights the importance of at least a minimal representation of each group in order to achieve low population loss (black curves) for all objectives.
Researcher Affiliation Academia 1Department of EECS, University of California, Berkeley 2Department of Statistics, University of California, Berkeley.
Pseudocode No The paper describes its methods through mathematical formulations and narrative text, but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes Code to replicate the experiments is available at https://github.com/estherrolf/representation-matters.
Open Datasets Yes We use a wide range of datasets to give a full empirical characterization of the phenomena of interest (see Table 1). The CIFAR-4 dataset is comprised of bird, car, horse, and plane image instances from CIFAR-10 (Krizhevsky, 2009). The ISIC dataset contains images of skin lesions labelled as benign or malignant (Codella et al., 2019). The Goodreads dataset consists of written book reviews and numerical ratings (Wan & Mc Auley, 2018). The Mooc dataset contains student demographic and participation data (Harvard X, 2014). The Adult dataset consists of demographic data from the 1994 Census (Dua & Graff, 2017).
Dataset Splits Yes We pick models and parameters via a cross-validation procedure over a coarse grid of α; details are given in Appendix B.3.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) used in the experiments.
Experiment Setup Yes We pick models and parameters via a cross-validation procedure over a coarse grid of α; details are given in Appendix B.3. For the image classification tasks, we compare group-agnostic empirical risk minimization (ERM) to importance weighting (implemented via importance sampling (IS) batches following the findings of Buda et al. (2018)) and group distributionally robust optimization (GDRO) with group-dependent regularization as in Sagawa et al. (2020).