Gradient Starvation: A Learning Proclivity in Neural Networks

Authors: Mohammad Pezeshki, Oumar Kaba, Yoshua Bengio, Aaron C. Courville, Doina Precup, Guillaume Lajoie

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We illustrate our findings with simple and realworld out-of-distribution (OOD) generalization experiments. ... We support our findings with extensive empirical results on a variety of classification and adversarial attack tasks.
Researcher Affiliation Collaboration Mohammad Pezeshki1,2 Sékou-Oumar Kaba1,3 Yoshua Bengio1,2 Aaron Courville1,2 Doina Precup1,3,4 Guillaume Lajoie1,2 1Mila 2Université de Montréal 3Mc Gill University 4Google Deep Mind
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes All code and experiment details available at Git Hub repository.
Open Datasets Yes CIFAR-10, CIFAR-100, and CIFAR-2 (cats vs dogs of CIFAR-10) [57]... Colored MNIST Dataset, proposed in [9].
Dataset Splits No The paper refers to 'Train' and 'Test' sets and IID/OOD splits but does not explicitly detail a validation set split (e.g., percentage, sample counts, or explicit mention of 'validation' in the context of data splits).
Hardware Specification No The paper mentions 'Calcul Québec and Compute Canada for providing us with the computing resources' but does not specify any particular hardware details such as GPU models, CPU types, or memory specifications used for running experiments.
Software Dependencies No The paper does not provide specific software dependency details with version numbers (e.g., Python 3.8, PyTorch 1.9) required to replicate the experiments.
Experiment Setup Yes For more details including the scheme for hyper-parameter tuning, see App. B. ... A two-layer Re LU network with 500 hidden units is trained with cross-entropy loss for two different arrangements of the training points. ... we conduct a classification experiment on CIFAR-10, CIFAR-100, and CIFAR-2 (cats vs dogs of CIFAR-10) [57] using a convolutional network with Re LU non-linearity.