Selective Classification Can Magnify Disparities Across Groups

Authors: Erik Jones, Shiori Sagawa, Pang Wei Koh, Ananya Kumar, Percy Liang

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We observe this behavior consistently across five vision and NLP datasets. and We consider five datasets (Table 1)
Researcher Affiliation Academia Department of Computer Science, Stanford University {erjones,ssagawa,pangwei,ananya,pliang}@cs.stanford.edu
Pseudocode Yes Algorithm 1: Group-agnostic reference for (ˆy, ˆc) at threshold τ and Algorithm 2: Robin Hood reference at threshold τ
Open Source Code Yes All code, data, and experiments are available on Coda Lab at https://worksheets. codalab.org/worksheets/0x7ceb817d53b94b0c8294a7a22643bf5e. The code is also available on Git Hub at https://github.com/ejones313/worst-group-sc.
Open Datasets Yes We consider five datasets (Table 1) on which prior work has shown that models latch onto spurious correlations...Celeb A. ... dataset (Liu et al., 2015). Waterbirds. ... dataset (Sagawa et al., 2020), constructed using images of birds from the Caltech-UCSD Birds dataset (Wah et et al., 2011) placed on backgrounds from the Places dataset (Zhou et al., 2017). Che Xpert-device. ... Che Xpert dataset (Irvin et al., 2019)... Civil Comments. ... dataset (Borkan et al., 2019). Multi NLI. ... Multi NLI dataset (Williams et al., 2018).
Dataset Splits Yes We use the official train-val-split of the dataset. and we first create a new 80/10/10 train/val/test split of examples from the publicly available Che Xpert train and validation sets
Hardware Specification No No specific hardware details (like GPU/CPU models, specific processors, or memory amounts) were mentioned for running experiments. The paper only discusses training parameters.
Software Dependencies No No specific version numbers for software dependencies were provided. The paper mentions using 'bert-base-uncased using the implementation from Wolf et al. (2019)' but does not specify a version for the Hugging Face Transformers library or other software.
Experiment Setup Yes For ERM we optimize with learning rate 1e-4, weight decay 1e-4, batch size 128, and train for 50 epochs.