Certifiable Out-of-Distribution Generalization

Authors: Nanyang Ye, Lin Zhu, Jia Wang, Zhaoyu Zeng, Jiayao Shao, Chensheng Peng, Bikang Pan, Kaican Li, Jun Zhu

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments Results This section will demonstrate the effectiveness of the proposed algorithmic framework with empirical experiments as it was found that benchmark results on Oo D datasets are susceptible to hyper-parameters choices. For a fair comparison, we evaluate the effectiveness of our method with the Oo D-Bench suit (Ye et al. 2021) based on the Domain Bed implementation (Gulrajani and Lopez-Paz 2021). With Oo D-Bench suit, we can evaluate the Oo D generalization performances on datasets dominated by diversity shifts or correlation shifts. Next, ablation studies are conducted for further analysis.
Researcher Affiliation Collaboration 1 Shanghai Jiao Tong University, Shanghai, China 2 University of Cambridge, Cambridge, United Kingdom 3 University of Warwick, Warwick, United Kingdom 4 Shanghai Tech University, Shanghai, China 5 Huawei Noah s Ark Lab, Hong Kong, China 6 Tsinghua University, Beijing, China
Pseudocode Yes Algorithm 1: Training procedure of stochastic disturbance learning
Open Source Code Yes Our code is available at https://github.com/ Zlatan Williams/Stochastic Disturbance Learning.
Open Datasets Yes We have selected PACS (Li et al. 2017), Office Home (Venkateswara et al. 2017), Terra Incognita (Beery, Horn, and Perona 2018), and Camelyon17-WILDS (Koh et al. 2020) for benchmarking on the diversity shift datasets, and Colored MNIST (Arjovsky et al. 2019), NICO (He, Shen, and Cui 2020) , and a modified version of Celeb A (Liu et al. 2015) for benchmarking on the correlation shift datasets.
Dataset Splits No No explicit details on specific percentages or sample counts for training, validation, and test splits are provided. The paper mentions using 'Oo D-Bench suit' and 'Domain Bed implementation' and discusses training and testing, but lacks specific numerical splits.
Hardware Specification No No specific hardware details (like GPU models, CPU types, or memory) were provided. The paper only mentions the models used for different datasets (ResNet-18, multi-layer perceptron).
Software Dependencies No No specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) are listed. The paper mentions using 'Oo D-Bench suit' and 'Domain Bed implementation'.
Experiment Setup Yes Require: Training set (X, Y), maximum number of epochs T, percentage of max-margin training epochs κ, percentage of top loss samples used in max-margin training η, batch-size B, variance of Gaussian distribution σ. Ensure: The model s parameters θ. ... For hyper-parameter search, we run twenty iterations for each algorithm and the search procedure is repeated three times.