reproducibilityindex.ai

Natural Neural Networks

Authors: Guillaume Desjardins, Karen Simonyan, Razvan Pascanu, koray kavukcuoglu

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We highlight the beneﬁts of our method on both unsupervised and supervised learning tasks, and showcase its scalability by training on the large-scale Image Net Challenge dataset. 4 Experiments We begin with a set of diagnostic experiments which highlight the effectiveness of our method at improving conditioning. We also illustrate the impact of the hyper-parameters T and , controlling the frequency of the reparametrization and the size of the trust region. Section 4.2 evaluates PRONG on unsupervised learning problems, where models are both deep and fully connected. Section 4.3 then moves onto large convolutional models for image classiﬁcation.
Researcher Affiliation	Industry	Google Deep Mind, London {gdesjardins,simonyan,razp,korayk}@google.com
Pseudocode	Yes	Algorithm 1 Projected Natural Gradient Descent
Open Source Code	No	The paper does not provide any explicit statement or link regarding the availability of open-source code for the described methodology.
Open Datasets	Yes	scaling our method from standard deep auto-encoders to large convolutional models on Image Net[20], trained across multiple GPUs. This is to our knowledge the ﬁrst-time a (non-diagonal) natural gradient algorithm is scaled to problems of this magnitude. We train a small 3-layer MLP with tanh non-linearities, on a downsampled version of MNIST (10x10) [11]. Results are presented on CIFAR-10 [9] and the Image Net Challenge (ILSVRC12) datasets [20].
Dataset Splits	Yes	Model selection was performed on a held-out validation set of 5k examples. On CIFAR-10, PRONG achieves better test error and converges faster. On Image Net, PRONG+ achieves comparable validation error while maintaining a faster covergence rate.
Hardware Specification	No	trained across multiple GPUs and Eight GPUs were used for computing gradients and estimating model statistics. While GPUs are mentioned, no specific models or other hardware details (CPU, RAM) are provided.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers.
Experiment Setup	Yes	Figures 3a3b highlight the effect of the eigenvalue regularization term and the reparametrization interval T. Note that these timing numbers reﬂect performance under the optimal choice of hyper-parameters, which in the case of batch normalization yielded a batch size of 256, compared to 128 for all other methods. The model was trained on 24 24 random crops with random horizontal reﬂections.