Towards Robust Out-of-Distribution Generalization Bounds via Sharpness

Authors: Yingtian Zou, Kenji Kawaguchi, Yingnan Liu, Jiashuo Liu, Mong-Li Lee, Wynne Hsu

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our findings are supported by the experiments on a ridge regression model, as well as the experiments on deep learning classification tasks.
Researcher Affiliation Academia School of Computing, National University of Singapore Institute of Data Science, National University of Singapore Department of Computer Science & Technology, Tsinghua University, China
Pseudocode Yes Algorithm 1 Pseudocode of model sharpness computation
Open Source Code No The paper mentions using existing codebases (e.g., Domain Bed) but does not state that the authors are releasing their own implementation code for the specific methodology described in the paper.
Open Datasets Yes We choose the 4-layer MLP on Rotated MNIST dataset where Rotated MNIST is a rotation of MNIST handwritten digit dataset (Le Cun, 1998) with different angles ranging from [0 , 15 , 30 , 45 , 60 , 75 ]. To evaluate our theorem more deeply, we examine the relationship between our defined sharpness and OOD generalization error on larger-scale real-world datasets, Wilds-Camelyon17 Bandi et al. (2018); Koh et al. (2021) and PACS Li et al. (2017).
Dataset Splits No The paper describes training and testing procedures for datasets but does not explicitly mention or detail a validation dataset split.
Hardware Specification No The paper does not specify any hardware details such as GPU models, CPU types, or memory used for running the experiments.
Software Dependencies No Table 1 lists optimizers (Adam) and other hyperparameters but does not provide specific version numbers for software libraries or dependencies (e.g., Python, PyTorch/TensorFlow versions).
Experiment Setup Yes Table 1: Hyperparameters we use for different DG algorithms in the experiments. [lists Optimizer, lr, WD, batch size, MLP size, eta, MMD, γ]. We randomly sample 50 data points and train a linear classifier with a gradient descent of 3000 iterations.