Domain-Inspired Sharpness-Aware Minimization Under Domain Shifts
Authors: Ruipeng Zhang, Ziqing Fan, Jiangchao Yao, Ya Zhang, Yanfeng Wang
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on various domain generalization benchmarks show the superiority of DISAM over a range of stateof-the-art methods. |
| Researcher Affiliation | Academia | Cooperative Medianet Innovation Center, Shanghai Jiao Tong University Shanghai Artificial Intelligence Laboratory {zhangrp, zqfan_knight, Sunarker, ya_zhang, wangyanfeng}@sjtu.edu.cn |
| Pseudocode | Yes | We give specific algorithmic details for DISAM in Algorithm 1, and the python code implementation is in Appendix D. |
| Open Source Code | Yes | The source code is released at https://github.com/Media Brain-SJTU/DISAM. |
| Open Datasets | Yes | We evaluate DISAM on five datasets PACS (Li et al., 2017), VLCS (Fang et al., 2013) Office Home (Venkateswara et al., 2017), Terra Incognita (Beery et al., 2018) (abbreviated as Terra Inc), and Domain Net (Peng et al., 2019), following the Domain Bed benchmark (Gulrajani & Lopez-Paz, 2021). |
| Dataset Splits | Yes | Specially, the unseen domain is used to evaluate the out-of-domain generalization, and the validation sets of source domains are used to measure the in-domain generalization, while the others are used for training. |
| Hardware Specification | Yes | All experiments were conducted using NVIDIA Ge Force RTX 3090 GPU, Python 3.9.15, Pytorch 1.12.1, and clip 1.0. |
| Software Dependencies | Yes | All experiments were conducted using NVIDIA Ge Force RTX 3090 GPU, Python 3.9.15, Pytorch 1.12.1, and clip 1.0. |
| Experiment Setup | Yes | For model hyperparameters, we adopt settings in (Wang et al., 2023b) for experiments using Res Net50 and in (Shu et al., 2023) for experiments using CLIP. As the default, we set the perturbation hyperparameter ρ to 0.05 (Wang et al., 2023b) (Fixed value during training), and the weight of the variance constraint λ to 0.1. |