Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Tighter Risk Certificates for Neural Networks

Authors: María Pérez-Ortiz, Omar Rivasplata, John Shawe-Taylor, Csaba Szepesvári

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This paper presents an empirical study regarding training probabilistic neural networks using training objectives derived from PAC-Bayes bounds. Our experiments on MNIST and CIFAR-10 show that our training methods produce competitive test set errors and non-vacuous risk bounds with much tighter values than previous results in the literature.
Researcher Affiliation Collaboration Mar ıa P erez-Ortiz EMAIL AI Centre, University College London (UK) Omar Rivasplata EMAIL AI Centre, University College London (UK) John Shawe-Taylor EMAIL AI Centre, University College London (UK) Csaba Szepesv ari EMAIL Deep Mind Edmonton (Canada)
Pseudocode Yes Algorithm 1 PAC-Bayes with Backprop (PBB)
Open Source Code Yes The code for our experiments is publicly available12 in Py Torch. 12. Code available at https://github.com/mperezortiz/PBB
Open Datasets Yes Our experiments on MNIST and CIFAR-10 show that our training methods produce competitive test set errors and non-vacuous risk bounds... We trained our models using the standard MNIST data set split of 60000 training and 10000 test examples. For CIFAR-10, we tested three convolutional architectures... and we used the standard data set split of 50000 training and 10000 test examples.
Dataset Splits Yes We trained our models using the standard MNIST data set split of 60000 training and 10000 test examples. For CIFAR-10... we used the standard data set split of 50000 training and 10000 test examples. We set 4% of the data as validation in MNIST (2400 examples) and 5% in the case of CIFAR-10 (2500 examples).
Hardware Specification No The paper does not explicitly mention any specific hardware (e.g., GPU models, CPU types) used for running the experiments.
Software Dependencies No The code for our experiments is publicly available12 in Py Torch. This mentions a software library (Py Torch) but does not provide a specific version number, nor does it list other software dependencies with version numbers.
Experiment Setup Yes We did a grid sweep over the prior distribution scale hyper-parameter (i.e. standard deviation σ0) with values in [0.1, 0.05, 0.04, 0.03, 0.02, 0.01, 0.005]. For the SGD with momentum optimiser we performed a grid sweep over learning rate in [1e 3, 5e 3, 1e 2] and momentum in [0.95, 0.99]... The dropout rate used for learning the prior was selected from [0.0, 0.05, 0.1, 0.2, 0.3]... We observed that the value pmin = 1e 5 performed well. The lambda value in flambda was initialised to 1.0... We ran the training for 100 epochs... We used a training batch size of 250 for all the experiments.