Understanding Priors in Bayesian Neural Networks at the Unit Level

Authors: Mariia Vladimirova, Jakob Verbeek, Pablo Mesejo, Julyan Arbel

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We illustrate the result of Theorem 3.1 on a 100 layers MLP. The hidden layers of neural network have H1 = 1000, H2 = 990, H3 = 980, . . . , Hℓ= 1000 10(ℓ 1), . . . , H100 = 10 hidden units, respectively. The input x is a vector of features from R104. Figure 3 represents the tails of first three, 10th and 100th hidden layers pre-nonlinearity marginal distributions in logarithmic scale. Units of one layer have the same sub-Weibull distribution since they share the same input and prior on the corresponding weights. The curves are obtained as histograms from a sample of size 105 from the prior on the pre-nonlinearities, which is itself obtained by sampling 105 sets of weights W from the Gaussian prior (2) and forward propagation via (1). The input vector x is sampled with independent features from a standard normal distribution once for all at the start. The nonlinearity φ is the Re LU function. Being a linear combination involving symmetric weights W , pre-nonlinearities g also have a symmetric distribution, thus we visualize only their distribution on R+. Figure 3 corroborates our main result.
Researcher Affiliation Academia 1Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, 38000 Grenoble, France 2Moscow Institute of Physics and Technology, 141701 Dolgoprudny, Russia 3Andalusian Research Institute in Data Science and Computational Intelligence (Da SCI), University of Granada, 18071 Granada, Spain.
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code for the methodology described in this paper.
Open Datasets No The paper uses a synthetically generated input 'x is a vector of features from R104' and 'The input vector x is sampled with independent features from a standard normal distribution' but does not provide access information for a public dataset.
Dataset Splits No The paper describes a simulation setup involving sampling from a prior, but does not provide specific dataset split information (like percentages or counts for training, validation, and testing) for reproducibility.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes We illustrate the result of Theorem 3.1 on a 100 layers MLP. The hidden layers of neural network have H1 = 1000, H2 = 990, H3 = 980, . . . , Hℓ= 1000 10(ℓ 1), . . . , H100 = 10 hidden units, respectively. The input x is a vector of features from R104. The nonlinearity φ is the Re LU function. The curves are obtained as histograms from a sample of size 105 from the prior on the pre-nonlinearities, which is itself obtained by sampling 105 sets of weights W from the Gaussian prior (2) and forward propagation via (1). The input vector x is sampled with independent features from a standard normal distribution once for all at the start.