Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Tighter Risk Certificates for Neural Networks
Authors: María Pérez-Ortiz, Omar Rivasplata, John Shawe-Taylor, Csaba Szepesvári
JMLR 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This paper presents an empirical study regarding training probabilistic neural networks using training objectives derived from PAC-Bayes bounds. Our experiments on MNIST and CIFAR-10 show that our training methods produce competitive test set errors and non-vacuous risk bounds with much tighter values than previous results in the literature. |
| Researcher Affiliation | Collaboration | Mar ıa P erez-Ortiz EMAIL AI Centre, University College London (UK) Omar Rivasplata EMAIL AI Centre, University College London (UK) John Shawe-Taylor EMAIL AI Centre, University College London (UK) Csaba Szepesv ari EMAIL Deep Mind Edmonton (Canada) |
| Pseudocode | Yes | Algorithm 1 PAC-Bayes with Backprop (PBB) |
| Open Source Code | Yes | The code for our experiments is publicly available12 in Py Torch. 12. Code available at https://github.com/mperezortiz/PBB |
| Open Datasets | Yes | Our experiments on MNIST and CIFAR-10 show that our training methods produce competitive test set errors and non-vacuous risk bounds... We trained our models using the standard MNIST data set split of 60000 training and 10000 test examples. For CIFAR-10, we tested three convolutional architectures... and we used the standard data set split of 50000 training and 10000 test examples. |
| Dataset Splits | Yes | We trained our models using the standard MNIST data set split of 60000 training and 10000 test examples. For CIFAR-10... we used the standard data set split of 50000 training and 10000 test examples. We set 4% of the data as validation in MNIST (2400 examples) and 5% in the case of CIFAR-10 (2500 examples). |
| Hardware Specification | No | The paper does not explicitly mention any specific hardware (e.g., GPU models, CPU types) used for running the experiments. |
| Software Dependencies | No | The code for our experiments is publicly available12 in Py Torch. This mentions a software library (Py Torch) but does not provide a specific version number, nor does it list other software dependencies with version numbers. |
| Experiment Setup | Yes | We did a grid sweep over the prior distribution scale hyper-parameter (i.e. standard deviation σ0) with values in [0.1, 0.05, 0.04, 0.03, 0.02, 0.01, 0.005]. For the SGD with momentum optimiser we performed a grid sweep over learning rate in [1e 3, 5e 3, 1e 2] and momentum in [0.95, 0.99]... The dropout rate used for learning the prior was selected from [0.0, 0.05, 0.1, 0.2, 0.3]... We observed that the value pmin = 1e 5 performed well. The lambda value in flambda was initialised to 1.0... We ran the training for 100 epochs... We used a training batch size of 250 for all the experiments. |