No Subclass Left Behind: Fine-Grained Robustness in Coarse-Grained Classification Problems
Authors: Nimit Sohoni, Jared Dunnmon, Geoffrey Angus, Albert Gu, Christopher Ré
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically validate GEORGE on a mix of real-world and benchmark image classification datasets, and show that our approach boosts worst-case subclass accuracy by up to 14 percentage points compared to standard training techniques, without requiring any information about the subclasses. |
| Researcher Affiliation | Academia | Department of Computer Science, Stanford University Institute for Computational and Mathematical Engineering, Stanford University {nims, jdunnmon, gdlangus, albertgu, chrismre}@cs.stanford.edu |
| Pseudocode | Yes | We provide more details below, and detailed pseudocode in Appendix B (Algorithm 1). |
| Open Source Code | Yes | To support these potential impacts, we have released a complete implementation of our code,3 with an easily usable Py Torch API. 3https://github.com/HazyResearch/hidden-stratification/ |
| Open Datasets | Yes | Waterbirds, a robustness benchmark introduced to evaluate GDRO in [43], contains images... Undersampled MNIST (U-MNIST) We design U-MNIST as a modified version of MNIST [28]... Celeb A Celeb A is a common face classification dataset also used as a robustness benchmark in [43]... ISIC The ISIC skin cancer dataset [12] is a public real-world dataset for classifying skin lesions as malignant or benign. |
| Dataset Splits | No | The paper mentions 'training and validation sets' in Section 4.1 and 'Waterbirds validation/test sets' in Appendix B.2.2, confirming their use. However, it does not provide specific split percentages, sample counts, or clear references to predefined splits with detailed methodology in the main text. |
| Hardware Specification | No | The paper mentions using 'deep models' and acknowledges support from the 'HAI-AWS Cloud Credits for Research program'. However, it does not provide specific hardware details such as GPU/CPU models, memory, or specific cloud instance types used for experiments. |
| Software Dependencies | No | The paper states that its implementation uses 'Py Torch API'. However, it does not specify the version number for PyTorch or any other software dependencies, which is required for reproducibility. |
| Experiment Setup | No | The paper describes the overall two-step process of GEORGE and mentions certain techniques like 'UMAP dimensionality reduction' and 'Silhouette (SIL) criterion' for clustering. It also states 'Additional details on datasets, model architectures, and experimental procedures are in Appendix B.' However, the main text does not provide concrete hyperparameter values (e.g., learning rate, batch size, number of epochs) or other system-level training settings. |