Learning via Wasserstein-Based High Probability Generalisation Bounds
Authors: Paul Viallard, Maxime Haddouche, Umut Simsekli, Benjamin Guedj
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | As a result we derive novel Wasserstein-based PAC-Bayesian learning algorithms and we illustrate their empirical advantage on a variety of experiments. We present in Table 1 the performance of Algorithms 1 and 2 compared to the Empirical Risk Minimisation (ERM) and the Online Gradient Descent (OGD) with the COCOB-Backprop optimiser. |
| Researcher Affiliation | Academia | Paul Viallard Inria, CNRS, Ecole Normale Supérieure, PSL Research University, Paris, France paul.viallard@inria.fr Maxime Haddouche Inria, University College London and Université de Lille, France maxime.haddouche@inria.fr Umut Sim sekli Inria, CNRS, Ecole Normale Supérieure PSL Research University, Paris, France umut.simsekli@inria.fr Benjamin Guedj Inria and University College London, France and UK benjamin.guedj@inria.fr |
| Pseudocode | Yes | Algorithm 1 (Mini-)Batch Learning Algorithm with Wasserstein distances and Algorithm 2 Online Learning Algorithm with Wasserstein distances in Appendix C. |
| Open Source Code | Yes | All the experiments are reproducible with the source code provided on Git Hub at https://github.com/paulviallard/NeurIPS23-PB-Wasserstein. |
| Open Datasets | Yes | We study the performance of Algorithms 1 and 2 on UCI datasets [DG17] along with MNIST [Le C98] and Fashion MNIST [XRV17]. |
| Dataset Splits | No | We also split all the data (from the original training/test set) in two halves; the first part of the data serves in the algorithm (and is considered as a training set), while the second part is used to approximate the population risks Rµ(h) and Cµ (and considered as a testing set). The paper describes splitting data into training and testing sets but does not explicitly mention a separate validation set. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments. It only refers to 'models' without specifying the underlying hardware. |
| Software Dependencies | No | The paper mentions using the 'COCOB-Backprop optimiser [OT17]' and implicitly references 'Pytorch' for the multi-margin loss function, but it does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | To perform the gradient steps, we use the COCOB-Backprop optimiser [OT17] (with parameter α = 10000). For Algorithm 1, which solves Equation (5), we fix a batch size of 100, i.e., |U| = 100, and the number of epochs T and T are fixed to perform at least 20000 iterations. Regarding Algorithm 2, which solves Equation (7), we set t = 100 for the log barrier, which is enough to constrain the weights and the number of iterations to T = 10. In the following, we consider D = 600 and L = 2; more experiments are considered in the appendix. We initialise the network similarly to [DR17] by sampling the weights from a Gaussian distribution with zero mean and a standard deviation of σ = 0.04; the weights are further clipped between 2σ and +2σ. Moreover, the values in the biases b1, . . . , b L are set to 0.1, while the values for b are set to 0. |