reproducibilityindex.ai

Posterior Concentration for Sparse Deep Learning

Authors: Nicholas G. Polson, Veronika Ročková

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We introduce Spike-and-Slab Deep Learning (SS-DL), a fully Bayesian alternative to dropout for improving generalizability of deep Re LU networks. This new type of regularization enables provable recovery of smooth input-output maps with unknown levels of smoothness. Indeed, we show that the posterior distribution concentrates at the near minimax rate for α-Hölder smooth maps, performing as well as if we knew the smoothness level α ahead of time. Our result sheds light on architecture design for deep neural networks, namely the choice of depth, width and sparsity level. These network attributes typically depend on unknown smoothness in order to be optimal. We obviate this constraint with the fully Bayes construction. As an aside, we show that SS-DL does not overﬁt in the sense that the posterior concentrates on smaller networks with fewer (up to the optimal number of) nodes and links. Our results provide new theoretical justiﬁcations for deep Re LU networks from a Bayesian point of view.
Researcher Affiliation	Academia	Nicholas G. Polson and Veronika Roˇcková Booth School of Business University of Chicago Chicago, IL 60637
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The paper does not mention providing open-source code for the Spike-and-Slab Deep Learning (SS-DL) methodology described.
Open Datasets	No	The paper describes simulating data for a motivating example: 'We simulate data from the following polynomial f1(x1, x2) = (x2 1x2 2 x2 1x2 + 1)2 where (x1, x2) take values in [ 1, 1]2. We discretize the grid for a total training data of 201 201 = 40401 observations.' However, this is simulated data, and no access information (link, DOI, or specific citation to a publicly available dataset) is provided.
Dataset Splits	No	The paper mentions 'MSE(train) = 0.0229, MSE(validation) = 0.0112' for a motivating example, indicating that a validation set was used. However, it does not specify the splitting percentages, sample counts, or the methodology for creating these splits from the total of 40401 observations, which is necessary for reproduction.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the illustrative motivating example.
Software Dependencies	No	The paper mentions that the models for the motivating example were 'trained with SGD in Tensor Flow and Keras'. However, it does not specify version numbers for these software components, which are necessary for reproducibility.
Experiment Setup	No	For the motivating example, the paper describes network architecture (e.g., '11-layer deep Re LU network', '9 units in the ﬁrst hidden layer and 3 units in the further layers', 'All activation functions are Re LU') and mentions training with 'SGD'. However, it does not provide specific hyperparameter values such as learning rates, batch sizes, or other detailed optimizer settings required for experiment reproduction.