reproducibilityindex.ai

Representation Matters: Assessing the Importance of Subgroup Allocations in Training Data

Authors: Esther Rolf, Theodora T Worledge, Benjamin Recht, Michael Jordan

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4. Empirical Results Having shown the importance of training set allocations from a theoretical perspective, we now provide a complementary empirical investigation of this phenomenon. See Appendix B for full details on each experimental setup. Figure 1 highlights the importance of at least a minimal representation of each group in order to achieve low population loss (black curves) for all objectives.
Researcher Affiliation	Academia	1Department of EECS, University of California, Berkeley 2Department of Statistics, University of California, Berkeley.
Pseudocode	No	The paper describes its methods through mathematical formulations and narrative text, but does not include structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code to replicate the experiments is available at https://github.com/estherrolf/representation-matters.
Open Datasets	Yes	We use a wide range of datasets to give a full empirical characterization of the phenomena of interest (see Table 1). The CIFAR-4 dataset is comprised of bird, car, horse, and plane image instances from CIFAR-10 (Krizhevsky, 2009). The ISIC dataset contains images of skin lesions labelled as benign or malignant (Codella et al., 2019). The Goodreads dataset consists of written book reviews and numerical ratings (Wan & Mc Auley, 2018). The Mooc dataset contains student demographic and participation data (Harvard X, 2014). The Adult dataset consists of demographic data from the 1994 Census (Dua & Graff, 2017).
Dataset Splits	Yes	We pick models and parameters via a cross-validation procedure over a coarse grid of α; details are given in Appendix B.3.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) used in the experiments.
Experiment Setup	Yes	We pick models and parameters via a cross-validation procedure over a coarse grid of α; details are given in Appendix B.3. For the image classiﬁcation tasks, we compare group-agnostic empirical risk minimization (ERM) to importance weighting (implemented via importance sampling (IS) batches following the ﬁndings of Buda et al. (2018)) and group distributionally robust optimization (GDRO) with group-dependent regularization as in Sagawa et al. (2020).