reproducibilityindex.ai

Initialization of ReLUs for Dynamical Isometry

Authors: Rebekka Burkholz, Alina Dubatovka

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We train fully-connected Re LU feed forward networks of different depth consisting of L = 1, . . . , 10 hidden layers with the same number of neurons Nl = N = 100, 300, 500 and an additional softmax classiﬁcation layer on MNIST [10] and CIFAR-10 [9] to compare three different initialization schemes: the standard He initialization and our two proposals in Sec. 3, i.e., GSM and orthogonal weights.
Researcher Affiliation	Academia	Rebekka Burkholz Department of Biostatistics Harvard T.H. Chan School of Public Health 655 Huntington Avenue, Boston, MA 02115 rburkholz@hsph.harvard.edu Alina Dubatovka Department of Computer Science ETH Zurich Universitätstrasse 6, 8092 Zurich alina.dubatovka@inf.ethz.ch
Pseudocode	No	The paper does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statement about releasing source code for the described methodology or a link to a code repository.
Open Datasets	Yes	We train fully-connected Re LU feed forward networks of different depth... on MNIST [10] and CIFAR-10 [9]
Dataset Splits	No	The paper uses MNIST and CIFAR-10 datasets but does not explicitly provide details about training/validation/test dataset splits, specific percentages, or how samples were divided for reproducibility.
Hardware Specification	Yes	Each experiment on MNIST was run on 1 Nvidia GTX 1080 Ti GPU, while each experiment on CIFAR-10 was performed on 4 Nvidia GTX 1080 Ti GPUs.
Software Dependencies	No	The paper does not specify the version numbers for any software dependencies, libraries, or frameworks used in the experiments.
Experiment Setup	Yes	We train fully-connected Re LU feed forward networks of different depth consisting of L = 1, . . . , 10 hidden layers with the same number of neurons Nl = N = 100, 300, 500 and an additional softmax classiﬁcation layer... We focus on minimizing the cross-entropy by Stochastic Gradient Descent (SGD) without batch normalization or any data augmentation techniques... we adapt the learning rate to (0.0001 + 0.003 exp( step/104))/L for MNIST and (0.00001 + 0.0005 exp( step/104))/L for CIFAR-10 for 104 SGD steps with a batch size of 100 in all cases.