Learning Stochastic Majority Votes by Minimizing a PAC-Bayes Generalization Bound

Authors: Valentina Zantedeschi, Paul Viallard, Emilie Morvant, Rémi Emonet, Amaury Habrard, Pascal Germain, Benjamin Guedj

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we empirically evaluate STOCMV, and we compare its generalization bounds and test errors to those obtained with PAC-Bayesian methods learning majority votes.
Researcher Affiliation Academia Valentina Zantedeschi123 Paul Viallard4 Emilie Morvant4 Rémi Emonet4 Amaury Habrard4 Pascal Germain5 Benjamin Guedj123 1 Inria, Lille Nord Europe research centre, France 2 The Inria London Programme, France and UK 3 University College London, Department of Computer Science, Centre for Artificial Intelligence, UK 4 Univ Lyon, UJM-Saint-Etienne, CNRS, Institut d Optique Graduate School, Laboratoire Hubert Curien UMR 5516, F-42023, Saint-Etienne, France 5 Département d informatique et de génie logiciel, Université Laval, Québec, Canada
Pseudocode No No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code Yes Code, available at https://github.com/vzantedeschi/Stoc MV, was implemented in pytorch [Paszke et al., 2019] and all experiments were run on a virtual machine with 8 v CPUs and 128Gb of RAM.
Open Datasets Yes We study the performance of our method on the binary classification two-moons dataset, with 2 features, 2 classes and Np0, 0.05q Gaussian noise, for which we draw n points for training, and 1, 000 points for testing. [...] We consider several classification datasets from UCI [Dua and Graff, 2017], LIBSVM2 and Zalando [Xiao et al., 2017], of different number of features and of instances.
Dataset Splits No No explicit, detailed training/test/validation dataset splits for the main model were provided with exact percentages or sample counts. For the two-moons dataset, 'n points for training, and 1, 000 points for testing' is given. For data-dependent priors, 'we split the training data S into two subsets (S≤m tpxi, yiq P Sum i 1 and Sąm tpxi, yiq P Sun i m 1)' is mentioned, and 'patience equal to 25 for early stopping' implies a validation set, but its specific split size or usage is not detailed for the main learning process.
Hardware Specification Yes Code, available at https://github.com/vzantedeschi/Stoc MV, was implemented in pytorch [Paszke et al., 2019] and all experiments were run on a virtual machine with 8 v CPUs and 128Gb of RAM.
Software Dependencies No The paper mentions 'pytorch [Paszke et al., 2019]' but does not provide a specific version number for it or any other software dependencies, which is required for reproducibility.
Experiment Setup Yes For this set of experiments, we optimize Seeger s Bound (Equation (1)) by (batch) Gradient Descent, for 1, 000 iterations and with learning rate equal to 0.1. [...] We train the models by Stochastic Gradient Descent (SGD) using Adam [Kingma and Ba, 2015] with p0.9, 0.999q running average coefficients, batch size equal to 1024 and learning rate equal to 0.1 with a scheduler reducing this parameter of a factor of 10 with 2 epochs patience. We fix the maximal number of epochs to 100 and patience equal to 25 for early stopping, and for MC we fix T 10 to increase randomness.