Numerical influence of ReLU’(0) on backpropagation

Authors: David Bertoin, Jérôme Bolte, Sébastien Gerchinovitz, Edouard Pauwels

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We investigate the importance of the value of Re LU (0) for several precision levels (16, 32, 64 bits), on various networks (fully connected, VGG, Res Net) and datasets (MNIST, CIFAR10, SVHN, Image Net). We observe considerable variations of backpropagation outputs which occur around half of the time in 32 bits precision.
Researcher Affiliation Collaboration David Bertoin IRT Saint Exup ery ISAE-SUPAERO ANITI Toulouse, France david.bertoin@irt-saintexupery.com J erˆome Bolte Toulouse School of Economics Universit e Toulouse 1 Capitole ANITI Toulouse, France jbolte@ut-capitole.fr S ebastien Gerchinovitz IRT Saint Exup ery Institut de Math ematiques de Toulouse ANITI Toulouse, France sebastien.gerchinovitz@irt-saintexupery.com Edouard Pauwels CNRS IRIT, Universit e Paul Sabatier ANITI Toulouse, France edouard.pauwels@irit.fr
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks labeled 'Algorithm' or 'Pseudocode'.
Open Source Code Yes All our experiments are done using Py Torch [26]; we provide the code to generate all figures presented in this manuscript.
Open Datasets Yes MNIST dataset [24], CIFAR10 dataset [23], SVHN [25], and Image Net [12]
Dataset Splits No The paper references well-known datasets (e.g., MNIST, CIFAR10, ImageNet) but does not explicitly provide the specific percentages or sample counts for train, validation, and test splits used to reproduce the data partitioning. It mentions 'test accuracy' and 'training loss' but no detailed split information for validation.
Hardware Specification No The paper mentions that some experiments were run 'on a CPU' and others 'on GPU' but does not provide specific details such as CPU models, GPU models (e.g., NVIDIA A100), or other hardware specifications like memory or number of cores.
Software Dependencies No The paper mentions using Py Torch [26], Tensor Flow [2], Jax [10], and the optuna library [3]. However, it does not provide specific version numbers for any of these software components.
Experiment Setup Yes We initialized two fully connected neural networks f0 and f1 of size 784 2000 128 10 with the same weights... with the same sequence of mini-batches (Bk)k N (minibatch size 128), using the recursion in (4) for s = 0 and s = 1 and with a fixed αk = 1, and γ chosen uniformly at random in [0.01, 0.012].