reproducibilityindex.ai

Lost Domain Generalization Is a Natural Consequence of Lack of Training Domains

Authors: Yimu Wang, Yihan Wu, Hongyang Zhang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on the Domain Bed benchmark demonstrate that o.o.d. test accuracy is monotonically increasing as the number of training domains increases. Our result sheds light on the intrinsic hardness of domain generalization and suggests benchmarking o.o.d. algorithms by the datasets with a sufﬁcient number of training domains. In this paper, we show that, information-theoretically, one requires at least poly(1/ ) number of training domains in order to achieve a small excess error for any learning algorithm. This is in sharp contrast to many existing benchmarks in which the number of training domains is limited. In this section, we complement our theoretical results with an empirical study to evaluate the impact of the number of training domains.
Researcher Affiliation	Academia	Yimu Wang1, Yihan Wu2, Hongyang Zhang1 1University of Waterloo 2University of Maryland, College Park {yimu.wang,hongyang.zhang}@uwaterloo.ca, ywu42@umd.edu
Pseudocode	No	No pseudocode or algorithm blocks are present in the paper.
Open Source Code	No	The paper states: 'We use the code repository of Domain Bed with Py Torch (Paszke et al. 2019).' This refers to a third-party repository and does not indicate that the authors' own implementation code is open-source.
Open Datasets	Yes	We conducted extensive experiments on two datasets from Domain Bed, i.e., Colored MNIST (Arjovsky et al. 2019) and Rotated MNIST (Ghifary et al. 2015).
Dataset Splits	Yes	Inspired by the protocol introduced in Domain Bed, we randomly split the training dataset into 10 sub-datasets with equal training samples. Following Domain Bed, we employ and adapt three different model selection methods for training algorithms. The details of the three model selection methods are shown in the Appendix. Training-domain validation set analysis. Test-domain validation set (oracle) analysis. Leave-one-domain-out cross-validation analysis.
Hardware Specification	No	No specific hardware (e.g., GPU model, CPU type) used for experiments is mentioned in the paper.
Software Dependencies	No	The paper mentions 'Py Torch (Paszke et al. 2019)' but does not provide a specific version number for PyTorch or any other software dependencies.
Experiment Setup	Yes	For each algorithm, we employ the default hyper-parameter introduced in Section D.2 of Domain Bed... We use the learning rate of 5e 5 while the remaining other hyparameters are the same. We train models using 9 different Domain Generalization algorithms, with a varying number of training domains on Colored MNIST and Rotated MNIST. Each trial is done with 5 different random seeds, and we present the average results.