reproducibilityindex.ai

Training Very Deep Networks

Authors: Rupesh K. Srivastava, Klaus Greff, Jürgen Schmidhuber

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	3 Experiments All networks were trained using SGD with momentum. An exponentially decaying learning rate was used in Section 3.1. For the rest of the experiments, a simpler commonly used strategy was employed where the learning rate starts at a value λ and decays according to a ﬁxed schedule by a factor γ. λ, γ and the schedule were selected once based on validation set performance on the CIFAR-10 dataset, and kept ﬁxed for all experiments.
Researcher Affiliation	Academia	The Swiss AI Lab IDSIA / USI / SUPSI {rupesh, klaus, juergen}@idsia.ch
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Source code, hyperparameter search results and related scripts are publicly available at http://people. idsia.ch/ rupesh/very_deep_learning/.
Open Datasets	Yes	We trained both plain and highway networks of varying varying depths on the MNIST digit classiﬁcation dataset. ... Experiments on CIFAR-10 and CIFAR-100 Object Recognition
Dataset Splits	Yes	λ, γ and the schedule were selected once based on validation set performance on the CIFAR-10 dataset, and kept ﬁxed for all experiments.
Hardware Specification	No	The paper mentions 'NVIDIA Corporation for their donation of GPUs' but does not specify exact GPU models or other detailed hardware specifications for running experiments.
Software Dependencies	No	The paper mentions 'Experiments were conducted using Caffe [33] and Brainstorm (https://github.com/IDSIA/brainstorm) frameworks' but does not specify their version numbers or other software dependencies with version numbers.
Experiment Setup	Yes	All networks were trained using SGD with momentum. ... hyperparameters: initial learning rate, momentum, learning rate exponential decay factor & activation function (either rectiﬁed linear or tanh). For highway networks, an additional hyperparameter was the initial value for the transform gate bias (between -1 and -10).