Dichotomize and Generalize: PAC-Bayesian Binary Activated Deep Neural Networks
Authors: Gaël Letarte, Pascal Germain, Benjamin Guedj, Francois Laviolette
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The performance of our approach is assessed on a thorough numerical experiment protocol on real-life datasets. 6 Numerical experiments |
| Researcher Affiliation | Academia | Gaël Letarte Université Laval Canada gael.letarte.1@ulaval.ca Pascal Germain Inria France pascal.germain@inria.fr Benjamin Guedj Inria and University College London France and United Kingdom benjamin.guedj@inria.fr François Laviolette Université Laval Canada francois.laviolette@ift.ulaval.ca |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks clearly labeled as 'Algorithm' or 'Pseudocode'. |
| Open Source Code | No | The paper does not provide an unambiguous statement or a direct link for the open-source code specific to the methodology described in this paper. |
| Open Datasets | Yes | The experiments were conducted on six binary classification datasets from the UCI Machine Learning Repository [Dua and Graff, 2017]: Adult, Ads, Chess, MNIST-17, MNIST-49 and MNIST-56. The dataset used to generate the results in Table 1, and Figure 5 in the appendix, are available at https://github.com/ghislainbourdeau/BAM. |
| Dataset Splits | Yes | MLP. We optimize the linear loss as the cost function and use 20% of training data as validation for hyperparameters selection. PBGNetpre. We also explore the possibility of using a part of the training data as a pre-training step. To do so, we split the training set into two halves. |
| Hardware Specification | Yes | We gratefully acknowledge the support of NVIDIA Corporation with the donation of Titan Xp GPUs used for this research. |
| Software Dependencies | No | The paper mentions 'PyTorch' and 'Poutyne' but does not specify their version numbers. It mentions the 'Adam optimizer' by name but not its software version within a specific library. |
| Experiment Setup | Yes | For all experiments, we train our models for 200 epochs at most, and we use a batch size of 20. The learning rate of the Adam optimizer is set to 0.001. Early stopping is used to interrupt the training when the cost function value is not improved for 20 consecutive epochs. Network architectures explored range from 1 to 3 hidden layers (L) and a hidden size h {10, 50, 100} (dk = h for 1 k < L). |