Selective Classification Can Magnify Disparities Across Groups
Authors: Erik Jones, Shiori Sagawa, Pang Wei Koh, Ananya Kumar, Percy Liang
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We observe this behavior consistently across five vision and NLP datasets. and We consider five datasets (Table 1) |
| Researcher Affiliation | Academia | Department of Computer Science, Stanford University {erjones,ssagawa,pangwei,ananya,pliang}@cs.stanford.edu |
| Pseudocode | Yes | Algorithm 1: Group-agnostic reference for (ˆy, ˆc) at threshold τ and Algorithm 2: Robin Hood reference at threshold τ |
| Open Source Code | Yes | All code, data, and experiments are available on Coda Lab at https://worksheets. codalab.org/worksheets/0x7ceb817d53b94b0c8294a7a22643bf5e. The code is also available on Git Hub at https://github.com/ejones313/worst-group-sc. |
| Open Datasets | Yes | We consider five datasets (Table 1) on which prior work has shown that models latch onto spurious correlations...Celeb A. ... dataset (Liu et al., 2015). Waterbirds. ... dataset (Sagawa et al., 2020), constructed using images of birds from the Caltech-UCSD Birds dataset (Wah et et al., 2011) placed on backgrounds from the Places dataset (Zhou et al., 2017). Che Xpert-device. ... Che Xpert dataset (Irvin et al., 2019)... Civil Comments. ... dataset (Borkan et al., 2019). Multi NLI. ... Multi NLI dataset (Williams et al., 2018). |
| Dataset Splits | Yes | We use the official train-val-split of the dataset. and we first create a new 80/10/10 train/val/test split of examples from the publicly available Che Xpert train and validation sets |
| Hardware Specification | No | No specific hardware details (like GPU/CPU models, specific processors, or memory amounts) were mentioned for running experiments. The paper only discusses training parameters. |
| Software Dependencies | No | No specific version numbers for software dependencies were provided. The paper mentions using 'bert-base-uncased using the implementation from Wolf et al. (2019)' but does not specify a version for the Hugging Face Transformers library or other software. |
| Experiment Setup | Yes | For ERM we optimize with learning rate 1e-4, weight decay 1e-4, batch size 128, and train for 50 epochs. |