Stability Based Generalization Bounds for Exponential Family Langevin Dynamics

Authors: Arindam Banerjee, Tiancong Chen, Xinyan Li, Yingxue Zhou

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Further, empirical results on benchmarks illustrate that our bounds are non-vacuous, quantitatively sharper than existing bounds, and behave correctly under noisy labels. In Section 4, we present experimental results on benchmark datasets.
Researcher Affiliation Academia 1Department of Computer Science, University of Illinois Urbana-Champaign 2Department of Computer Science, University of Minnesota, Twin Cities.
Pseudocode No The paper presents mathematical equations for algorithms (e.g., wt+1 = wt ηt ℓ(wt, SBt) + N 0, σ2 t I in Section 3), but does not include any blocks explicitly labeled “Pseudocode” or “Algorithm”.
Open Source Code No The paper does not contain an explicit statement about releasing source code for the described methodology or a link to a code repository.
Open Datasets Yes We use MNIST (Le Cun et al., 1998), Fashion-MNIST (Xiao et al., 2017), CIFAR-10 (Krizhevsky, 2009) and CIFAR-100 (Krizhevsky, 2009) in our experiments.
Dataset Splits No The paper describes the training and test sets for datasets like CIFAR-10 (“The training set includes 50,000 images while the test set contains the rest 10,000 images.”) and mentions subsets for MNIST, but it does not specify explicit validation dataset splits or percentages for any of the datasets used.
Hardware Specification Yes All experiments minimize cross-entropy loss for a fixed number of epochs and have been run on NVIDIA Tesla K40m GPUs.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., “Python 3.8, PyTorch 1.9, and CUDA 11.1”) that would be needed to replicate the experiments.
Experiment Setup Yes For MNIST and Fashion-MNIST, the initial learning rate is η0 = 0.004 and it decays by 0.96 after every 5 epochs. For CIFAR-10,the initial learning rate is η0 = 0.005 and it decays by 0.995 after every 5 epochs. We use batch size |Bt| = 100 for MNIST and Fashion-MNIST, and |Bt| = 200 for CIFAR-10. ... noise variance σt = 0.2 ηt and Tables 3 and 4 provide additional details like “Inverse Temperature β [5000, 55000]” and “Number of Epochs”.