reproducibilityindex.ai

Online PAC-Bayes Learning

Authors: Maxime Haddouche, Benjamin Guedj

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We propose then several algorithms with their associated training and test bounds as well as a short series of experiments to evaluate the consistency of our online PAC-Bayesian approach. Our efﬁciency criterion is not the classical regret but an expected cumulative loss close to the one of Wintenberger [2021]. More precisely, Sec. 3 propose a stable yet time-consuming Gibbs-based algorithm, while Sec. 4 proposes time efﬁcient yet volatile algorithms. We emphasize that our PAC-Bayesian results only require a bounded loss to hold: no assumption is made on the data distribution, priors can be data-dependent and we do not require any convexity assumption on the loss, as commonly assumed in the OL framework. Sec. 5 gathers supporting experiments.
Researcher Affiliation	Academia	Maxime Haddouche Inria and University College London France and UK Benjamin Guedj Inria and University College London France and UK
Pseudocode	Yes	Algorithm 1: A general OPBD algorithm for Gaussian measures with ﬁxed variance.
Open Source Code	Yes	anonymised code available here. Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] We have included the url in Sec. 5, note that this is an anonymous repository.
Open Datasets	Yes	We consider four real world dataset: two for classiﬁcation (Breast Cancer and Pima Indians), and two for regression (Boston Housing and California Housing). All datasets except the Pima Indians have been directly extracted from sklearn [Pedregosa et al., 2011]. Breast Cancer dataset [Street et al., 1993] is available here and comes from the UCI ML repository as well as the Boston Housing dataset [Belsley et al., 2005] which can be obtained here. California Housing dataset [Pace and Barry, 1997] comes from the Stat Lib repository and is available here. Finally, Pima Indians dataset [Smith et al., 1988] has been recovered from this Kaggle repository.
Dataset Splits	No	The paper mentions using several datasets and permuting observations, but it does not specify any training, validation, or test splits by percentage or sample count, nor does it refer to predefined splits.
Hardware Specification	Yes	We ran our experiments on a 2021 Mac Book Pro with an M1 chip and 16 Gb RAM.
Software Dependencies	No	The paper mentions 'extracted from sklearn [Pedregosa et al., 2011]' but does not provide a specific version number for scikit-learn or any other software dependencies used in the experiments.
Experiment Setup	Yes	For OGD, the initialisation point is 0Rd and the values of the learning rates are set to = 1/pm. For SVB, mean is initialised to 0Rd and covariance matrix to Diag(1). Step at time i is i = 0.1/i. For both of the OPB algorithms with Gibbs posterior, we chose λ = 1/m. As priors, we took respectively a centered Gaussian vector with the covariance matrix Diag(σ2) (σ = 1.5) and an iid vector following the standard Laplace distribution. For the OPBD algorithm with 1, we chose λ = 10 4/m, the initial mean is 0Rd and our ﬁxed covariance matrix is Diag(σ2) with σ = 3.10 3. For the OPBD algorithm with 1, we chose λ = 2.10 3/m, the initial mean is 0Rd and our covariance matrix is Diag(σ2) with σ = 10 2.