Adaptive Risk Minimization: Learning to Adapt to Domain Shift
Authors: Marvin Zhang, Henrik Marklund, Nikita Dhawan, Abhishek Gupta, Sergey Levine, Chelsea Finn
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Compared to prior methods for robustness, invariance, and adaptation, ARM methods provide performance gains of 1-4% test accuracy on a number of image classification problems exhibiting domain shift. Our experiments in Section 5 test on several image classification problems, derived from benchmarks for federated learning [9] and image classifier robustness [25], in which training and test domains share structure that can be leveraged for improved performance. The results for the four proposed benchmarks are presented in Table 1. |
| Researcher Affiliation | Academia | Marvin Zhang 1, Henrik Marklund 2, Nikita Dhawan 1, Abhishek Gupta1, Sergey Levine1, Chelsea Finn2 1 UC Berkeley, 2 Stanford University |
| Pseudocode | Yes | Algorithm 1 Meta-Learning for ARM |
| Open Source Code | Yes | These results are reproducible from the publicly available code: https://github.com/henrikmarklund/arm. |
| Open Datasets | Yes | Rotated MNIST. We study a modified version of MNIST... Federated Extended MNIST (FEMNIST). The extended MNIST (EMNIST) dataset consists of images of handwritten uppercase and lowercase letters, in addition to digits [12]. FEMNIST is the same dataset, but it also provides the meta-data of which user generated each data point [9]. Corrupted image datasets. CIFAR-10-C and Tiny Image Net-C [25] augment the CIFAR-10 [36] and Tiny Image Net test sets with common image corruptions... We also present results on datasets from the WILDS benchmark [35] in subsection 5.4. |
| Dataset Splits | No | The paper describes the organization of training and test data by "domains" and mentions "training data points" and "test set", but does not explicitly provide specific train/validation/test dataset splits (e.g., percentages, sample counts, or citations to pre-defined splits) for reproduction. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions 'Py Torch [52]' but does not provide a specific version number for PyTorch or any other software dependencies. |
| Experiment Setup | No | The paper describes the general evaluation protocol and how domains are structured but does not provide specific experimental setup details such as concrete hyperparameter values (e.g., learning rates, specific batch sizes for training, optimizer settings) in the main text. |