Towards Last-layer Retraining for Group Robustness with Fewer Annotations
Authors: Tyler LaBonte, Vidya Muthukumar, Abhishek Kumar
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical and theoretical results present the first evidence that model disagreement upsamples worst-group data, enabling SELF to nearly match DFR on four well-established benchmarks across vision and language tasks with no group annotations and less than 3% of the held-out class annotations. |
| Researcher Affiliation | Collaboration | 1H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology 2School of Electrical and Computer Engineering, Georgia Institute of Technology 3Google Deep Mind |
| Pseudocode | No | The paper describes methods in narrative text and does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured, code-like steps for any procedure. |
| Open Source Code | Yes | Our code is available at https://github.com/tmlabonte/last-layer-retraining. |
| Open Datasets | Yes | We study four datasets which are well-established as benchmarks for group robustness across vision and language tasks, detailed in Table 2 and summarized below. Waterbirds [71, 69, 58] is an image classification dataset... Celeb A [42, 58] is an image classification dataset... Civil Comments [7, 35] is a text classification dataset... Multi NLI [73, 58] is a text classification dataset... |
| Dataset Splits | Yes | Table 2: Dataset composition. ... Train Val Test ... Following previous work, we use half the validation set for feature reweighting [33, 28] and half for model selection with group annotations [58, 41, 33, 48, 28]. |
| Hardware Specification | Yes | Our experiments were conducted on Nvidia Tesla V100 and A5000 GPUs. |
| Software Dependencies | No | The paper lists several software packages (e.g., 'Num Py [22], Py Torch [53], Lightning [72], Torch Vision [44], Matplotlib [26], Transformers [74], and Milkshake [36]') but does not explicitly provide specific version numbers for these dependencies, relying instead on citations to general resources or past conference papers. |
| Experiment Setup | Yes | Table 11: ERM and last-layer retraining hyperparameters. We use standard hyperparameters following previous work [58, 27, 33, 28]. For last-layer retraining, we keep all hyperparameters the same except the number of epochs on Celeb A, which we increase to 100. |