reproducibilityindex.ai

No Subclass Left Behind: Fine-Grained Robustness in Coarse-Grained Classification Problems

Authors: Nimit Sohoni, Jared Dunnmon, Geoffrey Angus, Albert Gu, Christopher Ré

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically validate GEORGE on a mix of real-world and benchmark image classiﬁcation datasets, and show that our approach boosts worst-case subclass accuracy by up to 14 percentage points compared to standard training techniques, without requiring any information about the subclasses.
Researcher Affiliation	Academia	Department of Computer Science, Stanford University Institute for Computational and Mathematical Engineering, Stanford University {nims, jdunnmon, gdlangus, albertgu, chrismre}@cs.stanford.edu
Pseudocode	Yes	We provide more details below, and detailed pseudocode in Appendix B (Algorithm 1).
Open Source Code	Yes	To support these potential impacts, we have released a complete implementation of our code,3 with an easily usable Py Torch API. 3https://github.com/HazyResearch/hidden-stratification/
Open Datasets	Yes	Waterbirds, a robustness benchmark introduced to evaluate GDRO in [43], contains images... Undersampled MNIST (U-MNIST) We design U-MNIST as a modiﬁed version of MNIST [28]... Celeb A Celeb A is a common face classiﬁcation dataset also used as a robustness benchmark in [43]... ISIC The ISIC skin cancer dataset [12] is a public real-world dataset for classifying skin lesions as malignant or benign.
Dataset Splits	No	The paper mentions 'training and validation sets' in Section 4.1 and 'Waterbirds validation/test sets' in Appendix B.2.2, confirming their use. However, it does not provide specific split percentages, sample counts, or clear references to predefined splits with detailed methodology in the main text.
Hardware Specification	No	The paper mentions using 'deep models' and acknowledges support from the 'HAI-AWS Cloud Credits for Research program'. However, it does not provide specific hardware details such as GPU/CPU models, memory, or specific cloud instance types used for experiments.
Software Dependencies	No	The paper states that its implementation uses 'Py Torch API'. However, it does not specify the version number for PyTorch or any other software dependencies, which is required for reproducibility.
Experiment Setup	No	The paper describes the overall two-step process of GEORGE and mentions certain techniques like 'UMAP dimensionality reduction' and 'Silhouette (SIL) criterion' for clustering. It also states 'Additional details on datasets, model architectures, and experimental procedures are in Appendix B.' However, the main text does not provide concrete hyperparameter values (e.g., learning rate, batch size, number of epochs) or other system-level training settings.