Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Diverse Weight Averaging for Out-of-Distribution Generalization
Authors: Alexandre Rame, Matthieu Kirchmeyer, Thibaud Rahier, Alain Rakotomamonjy, Patrick Gallinari, Matthieu Cord
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimentally, Di WA consistently improves the state of the art on Domain Bed without inference overhead. We now present our evaluation on Domain Bed [12]. By imposing the code, the training procedures and the Res Net50 [52] architecture, Domain Bed is arguably the fairest benchmark for OOD generalization. It includes 5 multi-domain real-world datasets: PACS [51], VLCS [53], Office Home [50], Terra Incognita [54] and Domain Net [55]. |
| Researcher Affiliation | Collaboration | 1Sorbonne Université, CNRS, ISIR, F-75005 Paris, France 2Criteo AI Lab, Paris, France 3Valeo.ai, Paris, France 4Université de Rouen, LITIS, France |
| Pseudocode | Yes | Algorithm 1 Di WA Pseudo-code |
| Open Source Code | Yes | Our code is available at https://github.com/alexrame/diwa. |
| Open Datasets | Yes | We now present our evaluation on Domain Bed [12]. By imposing the code, the training procedures and the Res Net50 [52] architecture, Domain Bed is arguably the fairest benchmark for OOD generalization. It includes 5 multi-domain real-world datasets: PACS [51], VLCS [53], Office Home [50], Terra Incognita [54] and Domain Net [55]. |
| Dataset Splits | Yes | The validation dataset is sampled from S, i.e., we follow Domain Bed s training-domain model selection. |
| Hardware Specification | Yes | Approximately 20000 hours of GPUs (Nvidia V100) on an internal cluster, mostly for the 2640 runs needed in Table 1. |
| Software Dependencies | No | The paper does not provide specific software version numbers for ancillary software dependencies. |
| Experiment Setup | Yes | The experimental setup is further described in Appendix G.1. In our experiments, we thus use the mild search space defined in Table 7, first introduced in SWAD [14]. |