Certifiable Out-of-Distribution Generalization
Authors: Nanyang Ye, Lin Zhu, Jia Wang, Zhaoyu Zeng, Jiayao Shao, Chensheng Peng, Bikang Pan, Kaican Li, Jun Zhu
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments Results This section will demonstrate the effectiveness of the proposed algorithmic framework with empirical experiments as it was found that benchmark results on Oo D datasets are susceptible to hyper-parameters choices. For a fair comparison, we evaluate the effectiveness of our method with the Oo D-Bench suit (Ye et al. 2021) based on the Domain Bed implementation (Gulrajani and Lopez-Paz 2021). With Oo D-Bench suit, we can evaluate the Oo D generalization performances on datasets dominated by diversity shifts or correlation shifts. Next, ablation studies are conducted for further analysis. |
| Researcher Affiliation | Collaboration | 1 Shanghai Jiao Tong University, Shanghai, China 2 University of Cambridge, Cambridge, United Kingdom 3 University of Warwick, Warwick, United Kingdom 4 Shanghai Tech University, Shanghai, China 5 Huawei Noah s Ark Lab, Hong Kong, China 6 Tsinghua University, Beijing, China |
| Pseudocode | Yes | Algorithm 1: Training procedure of stochastic disturbance learning |
| Open Source Code | Yes | Our code is available at https://github.com/ Zlatan Williams/Stochastic Disturbance Learning. |
| Open Datasets | Yes | We have selected PACS (Li et al. 2017), Office Home (Venkateswara et al. 2017), Terra Incognita (Beery, Horn, and Perona 2018), and Camelyon17-WILDS (Koh et al. 2020) for benchmarking on the diversity shift datasets, and Colored MNIST (Arjovsky et al. 2019), NICO (He, Shen, and Cui 2020) , and a modified version of Celeb A (Liu et al. 2015) for benchmarking on the correlation shift datasets. |
| Dataset Splits | No | No explicit details on specific percentages or sample counts for training, validation, and test splits are provided. The paper mentions using 'Oo D-Bench suit' and 'Domain Bed implementation' and discusses training and testing, but lacks specific numerical splits. |
| Hardware Specification | No | No specific hardware details (like GPU models, CPU types, or memory) were provided. The paper only mentions the models used for different datasets (ResNet-18, multi-layer perceptron). |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) are listed. The paper mentions using 'Oo D-Bench suit' and 'Domain Bed implementation'. |
| Experiment Setup | Yes | Require: Training set (X, Y), maximum number of epochs T, percentage of max-margin training epochs κ, percentage of top loss samples used in max-margin training η, batch-size B, variance of Gaussian distribution σ. Ensure: The model s parameters θ. ... For hyper-parameter search, we run twenty iterations for each algorithm and the search procedure is repeated three times. |