Numerical influence of ReLU’(0) on backpropagation
Authors: David Bertoin, Jérôme Bolte, Sébastien Gerchinovitz, Edouard Pauwels
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We investigate the importance of the value of Re LU (0) for several precision levels (16, 32, 64 bits), on various networks (fully connected, VGG, Res Net) and datasets (MNIST, CIFAR10, SVHN, Image Net). We observe considerable variations of backpropagation outputs which occur around half of the time in 32 bits precision. |
| Researcher Affiliation | Collaboration | David Bertoin IRT Saint Exup ery ISAE-SUPAERO ANITI Toulouse, France david.bertoin@irt-saintexupery.com J erˆome Bolte Toulouse School of Economics Universit e Toulouse 1 Capitole ANITI Toulouse, France jbolte@ut-capitole.fr S ebastien Gerchinovitz IRT Saint Exup ery Institut de Math ematiques de Toulouse ANITI Toulouse, France sebastien.gerchinovitz@irt-saintexupery.com Edouard Pauwels CNRS IRIT, Universit e Paul Sabatier ANITI Toulouse, France edouard.pauwels@irit.fr |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks labeled 'Algorithm' or 'Pseudocode'. |
| Open Source Code | Yes | All our experiments are done using Py Torch [26]; we provide the code to generate all figures presented in this manuscript. |
| Open Datasets | Yes | MNIST dataset [24], CIFAR10 dataset [23], SVHN [25], and Image Net [12] |
| Dataset Splits | No | The paper references well-known datasets (e.g., MNIST, CIFAR10, ImageNet) but does not explicitly provide the specific percentages or sample counts for train, validation, and test splits used to reproduce the data partitioning. It mentions 'test accuracy' and 'training loss' but no detailed split information for validation. |
| Hardware Specification | No | The paper mentions that some experiments were run 'on a CPU' and others 'on GPU' but does not provide specific details such as CPU models, GPU models (e.g., NVIDIA A100), or other hardware specifications like memory or number of cores. |
| Software Dependencies | No | The paper mentions using Py Torch [26], Tensor Flow [2], Jax [10], and the optuna library [3]. However, it does not provide specific version numbers for any of these software components. |
| Experiment Setup | Yes | We initialized two fully connected neural networks f0 and f1 of size 784 2000 128 10 with the same weights... with the same sequence of mini-batches (Bk)k N (minibatch size 128), using the recursion in (4) for s = 0 and s = 1 and with a fixed αk = 1, and γ chosen uniformly at random in [0.01, 0.012]. |