reproducibilityindex.ai

Critical initialisation in continuous approximations of binary neural networks

Authors: George Stamatescu, Federica Gerace, Carlo Lucibello, Ian Fuss, Langford White

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We predict theoretically and conﬁrm numerically, that common weight initialisation schemes used in standard continuous networks, when applied to the mean values of the stochastic binary weights, yield poor training performance. This study shows that, contrary to common intuition, the means of the stochastic binary weights should be initialised close to 1, for deeper networks to be trainable. The results of the theoretical study, which are supported by numerical simulations and experiment, establish that for a surrogate of arbitrary depth to be trainable, it must be randomly initialised at criticality . 4 NUMERICAL AND EXPERIMENTAL RESULTS
Researcher Affiliation	Academia	George Stamatescu, Ian Fuss and Langford B. White School of Electrical and Electronic Engineering University of Adelaide Adelaide, Australia {george.stamatescu}@gmail.com {lang.white,ian.fuss}@adelaide.edu.au Federica Gerace Institut de Physique Th eorique CNRS & CEA & Universit e Paris-Saclay Saclay, France federicagerace91@gmail.com Carlo Lucibello Bocconi Institute for Data Science and Analytics Bocconi University Milan, Italy carlo.lucibello@unibocconi.it
Pseudocode	No	The paper describes mathematical derivations and theoretical frameworks but does not include any distinct pseudocode or algorithm blocks.
Open Source Code	No	The paper does not include any statements about releasing source code for the methodology or provide links to a code repository.
Open Datasets	Yes	We use the MNIST dataset with reduced training set size (50%) and record the training performance (percentage of the training set correctly labeled) after 10 epochs of gradient descent over the training set, for various network depths L < 70 and different mean variances σ2 m [0, 1).
Dataset Splits	No	The paper mentions using a 'reduced training set size (50%)' of the MNIST dataset, but it does not specify a validation set or describe how the data was split into training, validation, or test sets for model evaluation.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory configurations.
Software Dependencies	No	The paper mentions using 'SGD with Adam Kingma & Ba (2014)' as the optimizer, but it does not provide version numbers for Adam or any other software libraries or dependencies used in the experiments.
Experiment Setup	Yes	We use the MNIST dataset with reduced training set size (50%) and record the training performance (percentage of the training set correctly labeled) after 10 epochs of gradient descent over the training set, for various network depths L < 70 and different mean variances σ2 m [0, 1). The optimizer used was SGD with Adam Kingma & Ba (2014) with a learning rate of 2 10 4 chosen after simple grid search, and a batch size of 64.