Training Very Deep Networks

Authors: Rupesh K. Srivastava, Klaus Greff, Jürgen Schmidhuber

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 3 Experiments All networks were trained using SGD with momentum. An exponentially decaying learning rate was used in Section 3.1. For the rest of the experiments, a simpler commonly used strategy was employed where the learning rate starts at a value λ and decays according to a fixed schedule by a factor γ. λ, γ and the schedule were selected once based on validation set performance on the CIFAR-10 dataset, and kept fixed for all experiments.
Researcher Affiliation Academia The Swiss AI Lab IDSIA / USI / SUPSI {rupesh, klaus, juergen}@idsia.ch
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Source code, hyperparameter search results and related scripts are publicly available at http://people. idsia.ch/ rupesh/very_deep_learning/.
Open Datasets Yes We trained both plain and highway networks of varying varying depths on the MNIST digit classification dataset. ... Experiments on CIFAR-10 and CIFAR-100 Object Recognition
Dataset Splits Yes λ, γ and the schedule were selected once based on validation set performance on the CIFAR-10 dataset, and kept fixed for all experiments.
Hardware Specification No The paper mentions 'NVIDIA Corporation for their donation of GPUs' but does not specify exact GPU models or other detailed hardware specifications for running experiments.
Software Dependencies No The paper mentions 'Experiments were conducted using Caffe [33] and Brainstorm (https://github.com/IDSIA/brainstorm) frameworks' but does not specify their version numbers or other software dependencies with version numbers.
Experiment Setup Yes All networks were trained using SGD with momentum. ... hyperparameters: initial learning rate, momentum, learning rate exponential decay factor & activation function (either rectified linear or tanh). For highway networks, an additional hyperparameter was the initial value for the transform gate bias (between -1 and -10).