Fishr: Invariant Gradient Variances for Out-of-Distribution Generalization

Authors: Alexandre Rame, Corentin Dancette, Matthieu Cord

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate the effectiveness of Fishr for out-of-distribution generalization. Notably, Fishr improves the state of the art on the Domain Bed benchmark and performs consistently better than Empirical Risk Minimization. Our code is available at https: //github.com/alexrame/fishr.
Researcher Affiliation Collaboration 1Sorbonne Universit e, CNRS, LIP6, Paris, France 2Valeo.ai. Correspondence to: Alexandre Ram e <alexandre.rame@sorbonneuniversite.fr>.
Pseudocode Yes Algorithm 1 Training procedure for Fishr on Domain Bed.
Open Source Code Yes Our code is available at https: //github.com/alexrame/fishr.
Open Datasets Yes We conduct extensive experiments on the Domain Bed benchmark (Gulrajani & Lopez-Paz, 2021). In addition to the synthetic Colored MNIST (Arjovsky et al., 2019) and Rotated MNIST (Ghifary et al., 2015), the multi-domain image classification datasets are the real VLCS (Fang et al., 2013), PACS (Li et al., 2017), Office Home (Venkateswara et al., 2017), Terra Incognita (Beery et al., 2018) and Domain Net (Peng et al., 2019).
Dataset Splits Yes The data from each domain is split into 80% (used as training and testing) and 20% (used as validation for hyperparameter selection) splits.
Hardware Specification Yes For example, on PACS (7 classes and |ω| = 14, 343) with a Res Net-50 and batch size 32, Fishr induces an overhead in memory of +0.2% and in training time of +2.7% (with a Tesla V100) compared to ERM
Software Dependencies No The paper mentions using PyTorch and the Back PACK package but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes To limit access to test domain, the framework enforces that all methods are trained with only 20 different configurations of hyperparameters and for the same number of steps. Results are averaged over three trials. This experimental setup is further described in Appendix D.1.