SWAD: Domain Generalization by Seeking Flat Minima

Authors: Junbum Cha, Sanghyuk Chun, Kyungjae Lee, Han-Cheol Cho, Seunghyun Park, Yunsung Lee, Sungrae Park

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental SWAD shows state-of-the-art performances on five DG benchmarks, namely PACS, VLCS, Office Home, Terra Incognita, and Domain Net, with consistent and large margins of +1.6% averagely on outof-domain accuracy. We also compare SWAD with conventional generalization methods, such as data augmentation and consistency regularization methods, to verify that the remarkable performance improvements are originated from by seeking flat minima, not from better in-domain generalizability. Last but not least, SWAD is readily adaptable to existing DG methods without modification; the combination of SWAD and an existing DG method further improves DG performances. Source code is available at https://github.com/khanrc/swad. Table 1: Comparisons with SOTA. The proposed SWAD outperforms other state-of-the-art DG methods on five different DG benchmarks with significant gaps (+1.6pp in the average).
Researcher Affiliation Collaboration Junbum Cha1 Sanghyuk Chun2 Kyungjae Lee3 Han-Cheol Cho4 Seunghyun Park4 Yunsung Lee5 Sungrae Park6 1 Kakao Brain 2 NAVER AI Lab 3 Chung-Ang University 4 NAVER Clova 5 Korea University 6 Upstage AI Research
Pseudocode Yes Detailed pseudo code is provided in Appendix B.4.
Open Source Code Yes Source code is available at https://github.com/khanrc/swad.
Open Datasets Yes Dataset and optimization protocol. Following Gulrajani and Lopez-Paz [22], we exhaustively evaluate our method and comparison methods on various benchmarks: PACS [7] (9,991 images, 7 classes, and 4 domains), VLCS [43] (10,729 images, 5 classes, and 4 domains), Office Home [44] (15,588 images, 65 classes, and 4 domains), Terra Incognita [45] (24,788 images, 10 classes, and 4 domains), and Domain Net [46] (586,575 images, 345 classes, and 6 domains).
Dataset Splits Yes For training, we choose a domain as the target domain and use the remaining domains as the training domain where 20% samples are used for validation and model selection.
Hardware Specification No The paper mentions use of "Image Net [42] trained Res Net-50 [41]" and "Adam [38] optimizer" but does not specify any hardware details like GPU/CPU models, memory, or specific computing environments used for the experiments.
Software Dependencies No The paper mentions that "Image Net [42] trained Res Net-50 [41] is employed as the initial weight, and optimized by Adam [38] optimizer" and credits "NAVER Smart Machine Learning (NSML) [62] and Kakao Brain Cloud platform" for experiments, but it does not specify version numbers for any software dependencies like PyTorch, TensorFlow, or specific Python versions.
Experiment Setup Yes For training, we choose a domain as the target domain and use the remaining domains as the training domain where 20% samples are used for validation and model selection. Image Net [42] trained Res Net-50 [41] is employed as the initial weight, and optimized by Adam [38] optimizer with a learning rate of 5e-5. We construct a mini-batch containing all domains where each domain has 32 images. We set SWAD HPs Ns to 3, Ne to 6, and r to 1.2 for VLCS and 1.3 for the others by HP search on the validation sets.