DNA: Domain Generalization with Diversified Neural Averaging

Authors: Xu Chu, Yujie Jin, Wenwu Zhu, Yasha Wang, Xin Wang, Shanghang Zhang, Hong Mei

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, the proposed DNA method achieves the state-of-the-art classification performance on standard DG benchmark datasets.
Researcher Affiliation Academia 1Tsinghua University 2Peking University.
Pseudocode Yes Algorithm 1 DNA: DG with Diversified Neural Averaging.
Open Source Code Yes Codes are avalable at https://github.com/JinYujie99/DNA
Open Datasets Yes Datasets. Following Gulrajani & Lopez-Paz (2021), we exhaustively conduct experiments on various benchmark datasets to validate the proposed DNA: PACS (Li et al., 2017), VLCS (Fang et al., 2013), Office Home (Venkateswara et al., 2017), Terra Incognita (Beery et al., 2018) and Domain Net (Peng et al., 2019)).
Dataset Splits Yes We split each source domain into 8:2 training/validation splits and integrate the validation subsets of each source domain to create an overall validation set, which is used for validation and model selection.
Hardware Specification Yes We perform our experiments on three machines: two with 8 Nvidia RTX3090s and Xeon E5-2680, and one with 4 Nvidia V100 and Xeon Platinum 8163.
Software Dependencies Yes Our experiments are conducted with Python 3.7.9, and the following packages are used: Py Torch 1.7.1, torchvision 0.8.2 and Num Py 1.19.4.
Experiment Setup Yes We use Res Net-50 (He et al., 2016) pre-trained on Image Net (Deng et al., 2009) as the backbone network for all datasets. All the batch normalization (BN) layers are frozen during training. We replace the last FC layer of the Res Net-50 with a 2-layer classifier with 1024 hidden units and employ the dropout regularization on the 1024-dimensional output (2048 hidden units are used for Domain Net, adjusting for the larger label space). The network is trained using the Adam (Kingma & Ba, 2015) optimizer. The number of dropout samples m is set to 5. For weight averaging, we use the densely and overfit-aware sampling strategy in (Cha et al., 2021). We follow the hyperparameter search protocol in Domain Bed (Gulrajani & Lopez-Paz, 2021) and follow Cha et al. (2021) to use a reduced search space for computational efficiency. We search the trade-off hyperparameter η {0.01, 0.1, 1.0} and set η = 0.1 by model selection on the validation sets6. Batch size and Res Net dropout rate are fixed as 32 and 0. All the SWAD-specific hyperparameters are not searched and the default values are used. For DNA hyperparameters, we search the FC dropout rate and η. Here, the FC dropout rate is searched in [0.1, 0.3, 0.5] depending on datasets, hence 0.1 is used in Domain Net and 0.5 is used in the other datasets. η is searched in [0.01, 0.1, 1.0] on PACS and the searched value 0.1 is used for all experiments. The total number of training iterations is 20000 for Domain Net and 5000 for the other datasets, which are enough numbers for our method to be converged. The evaluation frequency is: 500 for Domain Net, 50 for VLCS and 100 for the others.