Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Fishr: Invariant Gradient Variances for Out-of-Distribution Generalization
Authors: Alexandre Rame, Corentin Dancette, Matthieu Cord
ICML 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the effectiveness of Fishr for out-of-distribution generalization. Notably, Fishr improves the state of the art on the Domain Bed benchmark and performs consistently better than Empirical Risk Minimization. Our code is available at https: //github.com/alexrame/fishr. |
| Researcher Affiliation | Collaboration | 1Sorbonne Universit e, CNRS, LIP6, Paris, France 2Valeo.ai. Correspondence to: Alexandre Ram e <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Training procedure for Fishr on Domain Bed. |
| Open Source Code | Yes | Our code is available at https: //github.com/alexrame/fishr. |
| Open Datasets | Yes | We conduct extensive experiments on the Domain Bed benchmark (Gulrajani & Lopez-Paz, 2021). In addition to the synthetic Colored MNIST (Arjovsky et al., 2019) and Rotated MNIST (Ghifary et al., 2015), the multi-domain image classification datasets are the real VLCS (Fang et al., 2013), PACS (Li et al., 2017), Office Home (Venkateswara et al., 2017), Terra Incognita (Beery et al., 2018) and Domain Net (Peng et al., 2019). |
| Dataset Splits | Yes | The data from each domain is split into 80% (used as training and testing) and 20% (used as validation for hyperparameter selection) splits. |
| Hardware Specification | Yes | For example, on PACS (7 classes and |ω| = 14, 343) with a Res Net-50 and batch size 32, Fishr induces an overhead in memory of +0.2% and in training time of +2.7% (with a Tesla V100) compared to ERM |
| Software Dependencies | No | The paper mentions using PyTorch and the Back PACK package but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | To limit access to test domain, the framework enforces that all methods are trained with only 20 different configurations of hyperparameters and for the same number of steps. Results are averaged over three trials. This experimental setup is further described in Appendix D.1. |