reproducibilityindex.ai

Domain Generalization without Excess Empirical Risk

Authors: Ozan Sener, Vladlen Koltun

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform our primary empirical evaluation on the CORAL model due to its simplicity [42]. We evaluate on the large-scale domain generalization problems in the WILDS benchmark of real distribution shifts [23]. Our evaluation using the image, formal language, natural language, and graph modalities suggests that our method significantly improves performance.
Researcher Affiliation	Industry	Ozan Sener Apple Vladlen Koltun Apple
Pseudocode	Yes	We present a more detailed discussion in Appendix C.1&C.2 with complete derivation and give pseudocode in Algorithm 1.
Open Source Code	No	The paper states "We implement the proposed algorithm in Py Torch" but does not provide a specific link or explicit statement about open-sourcing the code for its methodology.
Open Datasets	Yes	Our main evaluation is using the WILDS benchmark [23]. WILDS is designed to evaluate domain generalization and subpopulation shift in realistic problems. Among these problems, we use seven problems that are either pure domain generalization problems or a combination of domain generalization and subpopulation shift. These problems cover a wide range, including outdoor images (i Wild Cam), medical images (Camelyon), satellite images (FMo W, Poverty Map), natural language (Amazon), formal language (py150), and graph-structured data (OGB-Mol PCBA) with the number of domains from 5 to 120K. The summary of the benchmarks is in Appendix E.
Dataset Splits	Yes	We also perform early stopping and choose the best epoch using validation domain performance.
Hardware Specification	Yes	Table 5: Wall-clock time (in minutes) for training a single epoch for CORAL and CORAL+SDG method on an RTX 3090 GPU.
Software Dependencies	No	The paper states "We implement the proposed algorithm in PyTorch" but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup	Yes	We treat β as a hyperparameter and search in β {0.01, 0.1, 1.0} together with all other hyperparameters. In all of our experiments, we used 25 BA iterations. Since using all domains at each batch is not feasible, we first sample BD domains, then sample B examples per domain (named as group sampler in WILDS[23]). We search all hyperparameters with random sampling with the constraint that all methods have the same budget for the hyperparameter search. Specifically, we use 20 random hyperparameter choices. We also perform early stopping and choose the best epoch using validation domain performance. Moreover, to ensure a fair comparison, all methods are run for the same amount of wall-clock time. Since our method is slower, we perform fewer epochs than other methods. We list all chosen hyperparameters in Appendix F.