Representation Matters: Assessing the Importance of Subgroup Allocations in Training Data
Authors: Esther Rolf, Theodora T Worledge, Benjamin Recht, Michael Jordan
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4. Empirical Results Having shown the importance of training set allocations from a theoretical perspective, we now provide a complementary empirical investigation of this phenomenon. See Appendix B for full details on each experimental setup. Figure 1 highlights the importance of at least a minimal representation of each group in order to achieve low population loss (black curves) for all objectives. |
| Researcher Affiliation | Academia | 1Department of EECS, University of California, Berkeley 2Department of Statistics, University of California, Berkeley. |
| Pseudocode | No | The paper describes its methods through mathematical formulations and narrative text, but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code to replicate the experiments is available at https://github.com/estherrolf/representation-matters. |
| Open Datasets | Yes | We use a wide range of datasets to give a full empirical characterization of the phenomena of interest (see Table 1). The CIFAR-4 dataset is comprised of bird, car, horse, and plane image instances from CIFAR-10 (Krizhevsky, 2009). The ISIC dataset contains images of skin lesions labelled as benign or malignant (Codella et al., 2019). The Goodreads dataset consists of written book reviews and numerical ratings (Wan & Mc Auley, 2018). The Mooc dataset contains student demographic and participation data (Harvard X, 2014). The Adult dataset consists of demographic data from the 1994 Census (Dua & Graff, 2017). |
| Dataset Splits | Yes | We pick models and parameters via a cross-validation procedure over a coarse grid of α; details are given in Appendix B.3. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) used in the experiments. |
| Experiment Setup | Yes | We pick models and parameters via a cross-validation procedure over a coarse grid of α; details are given in Appendix B.3. For the image classification tasks, we compare group-agnostic empirical risk minimization (ERM) to importance weighting (implemented via importance sampling (IS) batches following the findings of Buda et al. (2018)) and group distributionally robust optimization (GDRO) with group-dependent regularization as in Sagawa et al. (2020). |