Invariance of Weight Distributions in Rectified MLPs

Authors: Russell Tsuchida, Fred Roosta, Marcus Gallagher

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Figure 3, we empirically verify Propositions 1 and 4. In the one hidden layer case, the samples follow the blue curve j = 1, regardless of the specific multivariate t weight distribution which varies with ν. We also observe that the universality of the equivalent kernel appears to hold for the distribution (7) regardless of the value of β, as predicted by theory. We discuss the relevance of the curves j = 1 in Section 5.
Researcher Affiliation Academia 1School of ITEE, University of Queensland, Brisbane, Queensland, Australia 2School of Mathematics and Physics, University of Queensland, Brisbane, Queensland, Australia 3International Computer Science Institute, Berkeley, California, USA.
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statement or link indicating that source code for the methodology is openly available.
Open Datasets Yes Figures 2 and 4 refer to "Celeb A dataset (Liu et al., 2015)" and "CHi ME3 embedded et05 real live speech data from The 4th CHi ME Speech Separation and Recognition Challenge (Vincent et al., 2017; Barker et al., 2017)", which are established public datasets with citations.
Dataset Splits No The paper describes experiments and mentions the use of datasets, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts).
Hardware Specification No The paper discusses empirical verification of theoretical results but does not provide any specific hardware specifications (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper does not provide any specific software dependencies with version numbers (e.g., libraries, frameworks, solvers).
Experiment Setup Yes In Section 4 and Figure 3, the paper states: "Empirical samples from a network with between 1 and 128 hidden layers, 1000 hidden neurons in each layer, m = 1000 and weights coming from different symmetric distributions." Also, Figure 4 mentions: "The network tested has 1000 inputs, 1000 neurons in each layer, and LRe LU activations with a = 0.2. The weights are randomly initialized from a Gaussian distribution. (Right) Weights initialized according to (8)."