Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)
Authors: Djork-Arné Clevert, Thomas Unterthiner, Sepp Hochreiter
ICLR 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In experiments, ELUs lead not only to faster learning, but also to significantly better generalization performance than Re LUs and LRe LUs on networks with more than 5 layers. On CIFAR-100 ELUs networks significantly outperform Re LU networks with batch normalization while batch normalization does not improve ELU networks. ELU networks are among the top 10 reported CIFAR-10 results and yield the best published result on CIFAR-100, without resorting to multi-view evaluation or model averaging. On Image Net, ELU networks considerably speed up learning compared to a Re LU network with the same architecture, obtaining less than 10% classification error for a single crop, single model network. |
| Researcher Affiliation | Academia | Djork-Arn e Clevert, Thomas Unterthiner & Sepp Hochreiter Institute of Bioinformatics Johannes Kepler University, Linz, Austria {okko,unterthiner,hochreit}@bioinf.jku.at |
| Pseudocode | No | The paper presents mathematical formulations and derivations, but no pseudocode or algorithm blocks are included. |
| Open Source Code | No | The paper does not provide any statement or link regarding the availability of open-source code for the described methodology. |
| Open Datasets | Yes | The following benchmark datasets are used: (i) MNIST (gray images in 10 classes, 60k train and 10k test), (ii) CIFAR-10 (color images in 10 classes, 50k train and 10k test), (iii) CIFAR-100 (color images in 100 classes, 50k train and 10k test), and (iv) Image Net (color images in 1,000 classes, 1.3M train and 100k tests). |
| Dataset Splits | Yes | MNIST (gray images in 10 classes, 60k train and 10k test), (ii) CIFAR-10 (color images in 10 classes, 50k train and 10k test), (iii) CIFAR-100 (color images in 100 classes, 50k train and 10k test), and (iv) Image Net (color images in 1,000 classes, 1.3M train and 100k tests). ... It contains about 1.3M training color images as well as additional 50k images and 100k images for validation and testing, respectively. |
| Hardware Specification | Yes | Acknowledgment. We thank the NVIDIA Corporation for supporting this research with several Titan X GPUs and Roland Vollgraf and Martin Heusel for helpful discussions and comments on this work. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers for replication. |
| Experiment Setup | Yes | Each network had eight hidden layers of 128 units each, and was trained for 300 epochs by stochastic gradient descent with learning rate 0.01 and mini-batches of size 64. The weights have been initialized according to (He et al., 2015). ... For network regularization we used the following drop-out rate for the last layer of each stack (0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.0). The L2-weight decay regularization term was set to 0.0005. The following learning rate schedule was applied (0-35k[0.01], 35k-85k[0.005], 85k-135k[0.0005], 135k-165k[0.00005]) (iterations [learning rate]). The momentum term learning rate was fixed to 0.9. ... The initial learning rate was set to 0.01 and decreased by a factor of 10 after 35k iterations. The minibatch size was 100. ... For network regularization we set the L2-weight decay term to 0.0005 and used 50% drop-out in the two penultimate FC layers. |