Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Certifiable Out-of-Distribution Generalization
Authors: Nanyang Ye, Lin Zhu, Jia Wang, Zhaoyu Zeng, Jiayao Shao, Chensheng Peng, Bikang Pan, Kaican Li, Jun Zhu
AAAI 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments Results This section will demonstrate the effectiveness of the proposed algorithmic framework with empirical experiments as it was found that benchmark results on Oo D datasets are susceptible to hyper-parameters choices. For a fair comparison, we evaluate the effectiveness of our method with the Oo D-Bench suit (Ye et al. 2021) based on the Domain Bed implementation (Gulrajani and Lopez-Paz 2021). With Oo D-Bench suit, we can evaluate the Oo D generalization performances on datasets dominated by diversity shifts or correlation shifts. Next, ablation studies are conducted for further analysis. |
| Researcher Affiliation | Collaboration | 1 Shanghai Jiao Tong University, Shanghai, China 2 University of Cambridge, Cambridge, United Kingdom 3 University of Warwick, Warwick, United Kingdom 4 Shanghai Tech University, Shanghai, China 5 Huawei Noah s Ark Lab, Hong Kong, China 6 Tsinghua University, Beijing, China |
| Pseudocode | Yes | Algorithm 1: Training procedure of stochastic disturbance learning |
| Open Source Code | Yes | Our code is available at https://github.com/ Zlatan Williams/Stochastic Disturbance Learning. |
| Open Datasets | Yes | We have selected PACS (Li et al. 2017), Office Home (Venkateswara et al. 2017), Terra Incognita (Beery, Horn, and Perona 2018), and Camelyon17-WILDS (Koh et al. 2020) for benchmarking on the diversity shift datasets, and Colored MNIST (Arjovsky et al. 2019), NICO (He, Shen, and Cui 2020) , and a modified version of Celeb A (Liu et al. 2015) for benchmarking on the correlation shift datasets. |
| Dataset Splits | No | No explicit details on specific percentages or sample counts for training, validation, and test splits are provided. The paper mentions using 'Oo D-Bench suit' and 'Domain Bed implementation' and discusses training and testing, but lacks specific numerical splits. |
| Hardware Specification | No | No specific hardware details (like GPU models, CPU types, or memory) were provided. The paper only mentions the models used for different datasets (ResNet-18, multi-layer perceptron). |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) are listed. The paper mentions using 'Oo D-Bench suit' and 'Domain Bed implementation'. |
| Experiment Setup | Yes | Require: Training set (X, Y), maximum number of epochs T, percentage of max-margin training epochs κ, percentage of top loss samples used in max-margin training η, batch-size B, variance of Gaussian distribution σ. Ensure: The model s parameters θ. ... For hyper-parameter search, we run twenty iterations for each algorithm and the search procedure is repeated three times. |