Domain Generalization without Excess Empirical Risk
Authors: Ozan Sener, Vladlen Koltun
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform our primary empirical evaluation on the CORAL model due to its simplicity [42]. We evaluate on the large-scale domain generalization problems in the WILDS benchmark of real distribution shifts [23]. Our evaluation using the image, formal language, natural language, and graph modalities suggests that our method significantly improves performance. |
| Researcher Affiliation | Industry | Ozan Sener Apple Vladlen Koltun Apple |
| Pseudocode | Yes | We present a more detailed discussion in Appendix C.1&C.2 with complete derivation and give pseudocode in Algorithm 1. |
| Open Source Code | No | The paper states "We implement the proposed algorithm in Py Torch" but does not provide a specific link or explicit statement about open-sourcing the code for its methodology. |
| Open Datasets | Yes | Our main evaluation is using the WILDS benchmark [23]. WILDS is designed to evaluate domain generalization and subpopulation shift in realistic problems. Among these problems, we use seven problems that are either pure domain generalization problems or a combination of domain generalization and subpopulation shift. These problems cover a wide range, including outdoor images (i Wild Cam), medical images (Camelyon), satellite images (FMo W, Poverty Map), natural language (Amazon), formal language (py150), and graph-structured data (OGB-Mol PCBA) with the number of domains from 5 to 120K. The summary of the benchmarks is in Appendix E. |
| Dataset Splits | Yes | We also perform early stopping and choose the best epoch using validation domain performance. |
| Hardware Specification | Yes | Table 5: Wall-clock time (in minutes) for training a single epoch for CORAL and CORAL+SDG method on an RTX 3090 GPU. |
| Software Dependencies | No | The paper states "We implement the proposed algorithm in PyTorch" but does not provide specific version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | We treat β as a hyperparameter and search in β {0.01, 0.1, 1.0} together with all other hyperparameters. In all of our experiments, we used 25 BA iterations. Since using all domains at each batch is not feasible, we first sample BD domains, then sample B examples per domain (named as group sampler in WILDS[23]). We search all hyperparameters with random sampling with the constraint that all methods have the same budget for the hyperparameter search. Specifically, we use 20 random hyperparameter choices. We also perform early stopping and choose the best epoch using validation domain performance. Moreover, to ensure a fair comparison, all methods are run for the same amount of wall-clock time. Since our method is slower, we perform fewer epochs than other methods. We list all chosen hyperparameters in Appendix F. |