Invariance of Weight Distributions in Rectified MLPs
Authors: Russell Tsuchida, Fred Roosta, Marcus Gallagher
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Figure 3, we empirically verify Propositions 1 and 4. In the one hidden layer case, the samples follow the blue curve j = 1, regardless of the specific multivariate t weight distribution which varies with ν. We also observe that the universality of the equivalent kernel appears to hold for the distribution (7) regardless of the value of β, as predicted by theory. We discuss the relevance of the curves j = 1 in Section 5. |
| Researcher Affiliation | Academia | 1School of ITEE, University of Queensland, Brisbane, Queensland, Australia 2School of Mathematics and Physics, University of Queensland, Brisbane, Queensland, Australia 3International Computer Science Institute, Berkeley, California, USA. |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement or link indicating that source code for the methodology is openly available. |
| Open Datasets | Yes | Figures 2 and 4 refer to "Celeb A dataset (Liu et al., 2015)" and "CHi ME3 embedded et05 real live speech data from The 4th CHi ME Speech Separation and Recognition Challenge (Vincent et al., 2017; Barker et al., 2017)", which are established public datasets with citations. |
| Dataset Splits | No | The paper describes experiments and mentions the use of datasets, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts). |
| Hardware Specification | No | The paper discusses empirical verification of theoretical results but does not provide any specific hardware specifications (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not provide any specific software dependencies with version numbers (e.g., libraries, frameworks, solvers). |
| Experiment Setup | Yes | In Section 4 and Figure 3, the paper states: "Empirical samples from a network with between 1 and 128 hidden layers, 1000 hidden neurons in each layer, m = 1000 and weights coming from different symmetric distributions." Also, Figure 4 mentions: "The network tested has 1000 inputs, 1000 neurons in each layer, and LRe LU activations with a = 0.2. The weights are randomly initialized from a Gaussian distribution. (Right) Weights initialized according to (8)." |