No Subclass Left Behind: Fine-Grained Robustness in Coarse-Grained Classification Problems

Authors: Nimit Sohoni, Jared Dunnmon, Geoffrey Angus, Albert Gu, Christopher Ré

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically validate GEORGE on a mix of real-world and benchmark image classification datasets, and show that our approach boosts worst-case subclass accuracy by up to 14 percentage points compared to standard training techniques, without requiring any information about the subclasses.
Researcher Affiliation Academia Department of Computer Science, Stanford University Institute for Computational and Mathematical Engineering, Stanford University {nims, jdunnmon, gdlangus, albertgu, chrismre}@cs.stanford.edu
Pseudocode Yes We provide more details below, and detailed pseudocode in Appendix B (Algorithm 1).
Open Source Code Yes To support these potential impacts, we have released a complete implementation of our code,3 with an easily usable Py Torch API. 3https://github.com/HazyResearch/hidden-stratification/
Open Datasets Yes Waterbirds, a robustness benchmark introduced to evaluate GDRO in [43], contains images... Undersampled MNIST (U-MNIST) We design U-MNIST as a modified version of MNIST [28]... Celeb A Celeb A is a common face classification dataset also used as a robustness benchmark in [43]... ISIC The ISIC skin cancer dataset [12] is a public real-world dataset for classifying skin lesions as malignant or benign.
Dataset Splits No The paper mentions 'training and validation sets' in Section 4.1 and 'Waterbirds validation/test sets' in Appendix B.2.2, confirming their use. However, it does not provide specific split percentages, sample counts, or clear references to predefined splits with detailed methodology in the main text.
Hardware Specification No The paper mentions using 'deep models' and acknowledges support from the 'HAI-AWS Cloud Credits for Research program'. However, it does not provide specific hardware details such as GPU/CPU models, memory, or specific cloud instance types used for experiments.
Software Dependencies No The paper states that its implementation uses 'Py Torch API'. However, it does not specify the version number for PyTorch or any other software dependencies, which is required for reproducibility.
Experiment Setup No The paper describes the overall two-step process of GEORGE and mentions certain techniques like 'UMAP dimensionality reduction' and 'Silhouette (SIL) criterion' for clustering. It also states 'Additional details on datasets, model architectures, and experimental procedures are in Appendix B.' However, the main text does not provide concrete hyperparameter values (e.g., learning rate, batch size, number of epochs) or other system-level training settings.