Evaluating Robustness to Dataset Shift via Parametric Robustness Sets
Authors: Nikolaj Thams, Michael Oberst, David Sontag
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply our approach to a computer vision task (classifying gender from images), revealing sensitivity to shifts in non-causal attributes. In a computer vision task, we find that this approach finds more impactful shifts than a reweighting approach, while taking far less time to compute, and that the resulting estimates of accuracy are substantially more reliable (see Section 4). We simulate K = 100 validation sets from P, in each estimating the worst-case shifts δTaylor (via the approach in Section 3.3) and δIS, where the latter corresponds to minimizing ˆEδ,IS using a standard non-convex solver from the scipy library [Virtanen et al., 2020]. We simulate ground truth data from PδIS and PδTaylor, to compare the two shifts. |
| Researcher Affiliation | Academia | Nikolaj Thams Dept. of Mathematical Sciences University of Copenhagen Copenhagen, Denmark thams@math.ku.dk Michael Oberst CSAIL & IMES MIT Cambridge, MA moberst@mit.edu David Sontag CSAIL & IMES MIT Cambridge, MA dsontag@csail.mit.edu |
| Pseudocode | No | The paper provides mathematical formulations and descriptions of the approach, but does not include any explicit pseudocode blocks or sections labeled 'Algorithm'. |
| Open Source Code | Yes | Code is available at this link. |
| Open Datasets | Yes | To illustrate this use-case, we make use of the Celeb A dataset [Liu et al., 2015], which contains images of faces and binary attributes (e.g., glasses, beard, etc.) encoding several features whose correlations may be unstable (e.g., the relation between gender and being bald). |
| Dataset Splits | No | We simulate K = 100 validation sets from P, in each estimating the worst-case shifts δTaylor (via the approach in Section 3.3) and δIS, where the latter corresponds to minimizing ˆEδ,IS using a standard non-convex solver from the scipy library [Virtanen et al., 2020]. |
| Hardware Specification | No | The paper mentions finetuning a ResNet50 classifier but does not provide specific hardware details like GPU/CPU models, processor types, or memory amounts used for running experiments. |
| Software Dependencies | No | The paper mentions using a 'standard non-convex solver from the scipy library' and 'finetuning a pretrained ResNet50 classifier', but it does not specify version numbers for these or any other software dependencies. |
| Experiment Setup | No | The paper states using a 'ResNet50 classifier' and '0/1 loss' and constraints like 'δ 2 λ = 2' but does not provide specific experimental setup details such as learning rates, batch sizes, optimizers, or training schedules. |