PAC-Bayes Un-Expected Bernstein Inequality
Authors: Zakaria Mhammedi, Peter Grünwald, Benjamin Guedj
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we experimentally compare our bound in Theorem 3 to that of TS [39], Catoni [9, Theorem 1.2.8] (with α = 2), and Maurer in (2). For the latter, given Ln(Pn) [0,1[ and the RHS of (2), we solve for an upper bound of L(Pn) by inverting the kl. We note that TS [39] do not claim that their bound is better than Maurer s in classification (in fact, they do better in other settings). |
| Researcher Affiliation | Academia | Zakaria Mhammedi The Australian National University and Data61 zak.mhammedi@anu.edu.au Peter D. Grünwald CWI and Leiden University pdg@cwi.nl Benjamin Guedj Inria and University College London benjamin.guedj@inria.fr |
| Pseudocode | No | The paper contains mathematical derivations and definitions but no structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete access to source code for the methodology described. |
| Open Datasets | Yes | UCI datasets. For the second experiment, we use several UCI datasets. These are listed in Table 1 (where Breast-C. stands for Breast Cancer). We encode categorical variables in appropriate 0-1 vectors. This effectively increases the dimension of the input space (this is reported as d in Table 1). After removing any rows (i.e. instances) containing missing features and performing the encoding, the input data is scaled such that every column has values between -1 and 1. We used a 5-fold train-test split (n in Table 1 is the training set size), and the results in Table 1 are averages over 5 runs. We only compare with Maurer s bound since other bounds were worse than Maurer s and ours on all datasets. |
| Dataset Splits | Yes | We used a 5-fold train-test split (n in Table 1 is the training set size), and the results in Table 1 are averages over 5 runs. |
| Hardware Specification | No | The paper does not provide any specific hardware details (e.g., GPU/CPU models, processor types, memory amounts) used for its experiments. |
| Software Dependencies | No | The paper mentions using the BFGS algorithm but does not specify any software dependencies with version numbers. |
| Experiment Setup | Yes | Parameters. We set δ = 0.05. For all datasets, we use λ = 0.01, and (approximately) solve (8) using the BFGS algorithm. For each bound, we pick the σ2 {1/2,...,1/2J J = log2 n } which minimizes it on the given data (with n instances). In order for the bounds to still hold with probability at least 1 δ, we replace δ on the RHS of each bound by δ/ log2 n (this follows from the application of a union bound). We choose the prior variance such that σ2 0 = 1/2 (this was the best value on average for the bounds we compare against). We choose the grid G in Theorem 3 as in (7). Finally, we approximate Gaussian expectations using Monte Carlo sampling. |