reproducibilityindex.ai

Adaptive Risk Minimization: Learning to Adapt to Domain Shift

Authors: Marvin Zhang, Henrik Marklund, Nikita Dhawan, Abhishek Gupta, Sergey Levine, Chelsea Finn

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Compared to prior methods for robustness, invariance, and adaptation, ARM methods provide performance gains of 1-4% test accuracy on a number of image classiﬁcation problems exhibiting domain shift. Our experiments in Section 5 test on several image classiﬁcation problems, derived from benchmarks for federated learning [9] and image classiﬁer robustness [25], in which training and test domains share structure that can be leveraged for improved performance. The results for the four proposed benchmarks are presented in Table 1.
Researcher Affiliation	Academia	Marvin Zhang 1, Henrik Marklund 2, Nikita Dhawan 1, Abhishek Gupta1, Sergey Levine1, Chelsea Finn2 1 UC Berkeley, 2 Stanford University
Pseudocode	Yes	Algorithm 1 Meta-Learning for ARM
Open Source Code	Yes	These results are reproducible from the publicly available code: https://github.com/henrikmarklund/arm.
Open Datasets	Yes	Rotated MNIST. We study a modiﬁed version of MNIST... Federated Extended MNIST (FEMNIST). The extended MNIST (EMNIST) dataset consists of images of handwritten uppercase and lowercase letters, in addition to digits [12]. FEMNIST is the same dataset, but it also provides the meta-data of which user generated each data point [9]. Corrupted image datasets. CIFAR-10-C and Tiny Image Net-C [25] augment the CIFAR-10 [36] and Tiny Image Net test sets with common image corruptions... We also present results on datasets from the WILDS benchmark [35] in subsection 5.4.
Dataset Splits	No	The paper describes the organization of training and test data by "domains" and mentions "training data points" and "test set", but does not explicitly provide specific train/validation/test dataset splits (e.g., percentages, sample counts, or citations to pre-defined splits) for reproduction.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions 'Py Torch [52]' but does not provide a specific version number for PyTorch or any other software dependencies.
Experiment Setup	No	The paper describes the general evaluation protocol and how domains are structured but does not provide specific experimental setup details such as concrete hyperparameter values (e.g., learning rates, specific batch sizes for training, optimizer settings) in the main text.