PAC-Bayes-Chernoff bounds for unbounded losses
Authors: Ioar Casado Telletxea, Luis Antonio Ortega Andrés, Aritz Pérez, Andres Masegosa
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Figure 1: Models with very different CGFs coexist within the same model class. On the left, we display several metrics for Inception V3 models trained on CIFAR10 without regularization (Standard) and with L2 regularization (L2). Random refers to a model learned over randomly reshuffled labels and Zero refers to a model where all the weights are equal to zero. For each model, the metrics include train and test accuracy, test log-loss, ℓ2-norm of the parameters of the model, the variance of the log-loss function, denoted Vν(ℓ(x, θ)), and the expected norm of the input-gradients, denoted Eν xℓ(x, θ) 2 2 . On the right, we display the estimated CGFs of each model, following Masegosa and Ortega (2023). Note how models with smaller variance V(ℓ(x, θ)), ℓ2-norm or inputgradient norm Eν xℓ(x, θ) 2 2 have smaller CGFs. Bounds derived from Theorem 7 naturally exploit these differences. Experimental details in Appendix C. |
| Researcher Affiliation | Academia | Ioar Casado Machine Learning Group Basque Center for Applied Mathematics (BCAM) icasado@bcamath.org Luis A. Ortega Machine Learning Group Computer Science Dept. EPS. Universidad Autónoma de Madrid luis.ortega@uam.es Aritz Pérez Machine Learning Group Basque Center for Applied Mathematics (BCAM) aperez@bcamath.org Andrés R. Masegosa Department of Computer Science Aalborg University arma@cs.aau.dk |
| Pseudocode | No | No pseudocode or algorithm blocks are present in the paper. |
| Open Source Code | Yes | We include a Jupyter Notebook in the Supplementary Material with the code for reproducing our experiments and figures. |
| Open Datasets | Yes | We trained the model in the CIFAR10 dataset (Krizhevsky et al., 2009) with the default train/test split... |
| Dataset Splits | No | We trained the model in the CIFAR10 dataset (Krizhevsky et al., 2009) with the default train/test split using SGD with momentum 0.9 and learning rate 0.01 with exponential decay of 0.95. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) are provided. |
| Software Dependencies | No | The paper mentions using a 'Jupyter Notebook' and general terms like 'SGD' but does not provide specific version numbers for software dependencies like Python, PyTorch, or other libraries. |
| Experiment Setup | Yes | We trained the model in the CIFAR10 dataset (Krizhevsky et al., 2009) with the default train/test split using SGD with momentum 0.9 and learning rate 0.01 with exponential decay of 0.95. All models are trained for 30.000 iterations of batches of size 200 or until the train loss is under 0.005. These settings are selected to ensure that the random label model converges to an interpolator. For ℓ2 regularization, the multiplicative factor is 0.01. |