Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Robustness Guarantees for Bayesian Inference with Gaussian Processes
Authors: Luca Cardelli, Marta Kwiatkowska, Luca Laurenti, Andrea Patane7759-7768
AAAI 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our techniques on two examples, a GP regression problem and a fully-connected deep neural network, where we rely on weak convergence to GPs to study adversarial examples on the MNIST dataset. |
| Researcher Affiliation | Collaboration | 1Microsoft Research Cambridge, 2University of Oxford |
| Pseudocode | No | The paper describes algorithmic methods, but it does not include a clearly labeled pseudocode block or algorithm steps in a structured format. |
| Open Source Code | Yes | Code available at: https://github.com/andreapatane/checkGP. |
| Open Datasets | Yes | we train a selection of Re LU GPs on a subset of the MNIST dataset |
| Dataset Splits | No | The paper discusses training on a subset of the MNIST dataset using least-square classification and notes "Classification accuracy obtained on the full MNIST test set". It mentions using 100 to 2000 training samples, but does not specify a validation split or how one would reproduce it. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., 'Python 3.8, PyTorch 1.9'). It mentions using SIFT and discusses GPs and NNs, but without versioned software details. |
| Experiment Setup | Yes | Unless otherwise stated, we perform analysis on the best model obtained using 1000 training samples, that is, a two-hidden-layer architecture with σ2 w = 3.19 and σ2 b = 0.00. |