Learning Stochastic Majority Votes by Minimizing a PAC-Bayes Generalization Bound
Authors: Valentina Zantedeschi, Paul Viallard, Emilie Morvant, Rémi Emonet, Amaury Habrard, Pascal Germain, Benjamin Guedj
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we empirically evaluate STOCMV, and we compare its generalization bounds and test errors to those obtained with PAC-Bayesian methods learning majority votes. |
| Researcher Affiliation | Academia | Valentina Zantedeschi123 Paul Viallard4 Emilie Morvant4 Rémi Emonet4 Amaury Habrard4 Pascal Germain5 Benjamin Guedj123 1 Inria, Lille Nord Europe research centre, France 2 The Inria London Programme, France and UK 3 University College London, Department of Computer Science, Centre for Artificial Intelligence, UK 4 Univ Lyon, UJM-Saint-Etienne, CNRS, Institut d Optique Graduate School, Laboratoire Hubert Curien UMR 5516, F-42023, Saint-Etienne, France 5 Département d informatique et de génie logiciel, Université Laval, Québec, Canada |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | Yes | Code, available at https://github.com/vzantedeschi/Stoc MV, was implemented in pytorch [Paszke et al., 2019] and all experiments were run on a virtual machine with 8 v CPUs and 128Gb of RAM. |
| Open Datasets | Yes | We study the performance of our method on the binary classification two-moons dataset, with 2 features, 2 classes and Np0, 0.05q Gaussian noise, for which we draw n points for training, and 1, 000 points for testing. [...] We consider several classification datasets from UCI [Dua and Graff, 2017], LIBSVM2 and Zalando [Xiao et al., 2017], of different number of features and of instances. |
| Dataset Splits | No | No explicit, detailed training/test/validation dataset splits for the main model were provided with exact percentages or sample counts. For the two-moons dataset, 'n points for training, and 1, 000 points for testing' is given. For data-dependent priors, 'we split the training data S into two subsets (S≤m tpxi, yiq P Sum i 1 and Sąm tpxi, yiq P Sun i m 1)' is mentioned, and 'patience equal to 25 for early stopping' implies a validation set, but its specific split size or usage is not detailed for the main learning process. |
| Hardware Specification | Yes | Code, available at https://github.com/vzantedeschi/Stoc MV, was implemented in pytorch [Paszke et al., 2019] and all experiments were run on a virtual machine with 8 v CPUs and 128Gb of RAM. |
| Software Dependencies | No | The paper mentions 'pytorch [Paszke et al., 2019]' but does not provide a specific version number for it or any other software dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | For this set of experiments, we optimize Seeger s Bound (Equation (1)) by (batch) Gradient Descent, for 1, 000 iterations and with learning rate equal to 0.1. [...] We train the models by Stochastic Gradient Descent (SGD) using Adam [Kingma and Ba, 2015] with p0.9, 0.999q running average coefficients, batch size equal to 1024 and learning rate equal to 0.1 with a scheduler reducing this parameter of a factor of 10 with 2 epochs patience. We fix the maximal number of epochs to 100 and patience equal to 25 for early stopping, and for MC we fix T 10 to increase randomness. |