Gradient Starvation: A Learning Proclivity in Neural Networks
Authors: Mohammad Pezeshki, Oumar Kaba, Yoshua Bengio, Aaron C. Courville, Doina Precup, Guillaume Lajoie
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We illustrate our findings with simple and realworld out-of-distribution (OOD) generalization experiments. ... We support our findings with extensive empirical results on a variety of classification and adversarial attack tasks. |
| Researcher Affiliation | Collaboration | Mohammad Pezeshki1,2 Sékou-Oumar Kaba1,3 Yoshua Bengio1,2 Aaron Courville1,2 Doina Precup1,3,4 Guillaume Lajoie1,2 1Mila 2Université de Montréal 3Mc Gill University 4Google Deep Mind |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | All code and experiment details available at Git Hub repository. |
| Open Datasets | Yes | CIFAR-10, CIFAR-100, and CIFAR-2 (cats vs dogs of CIFAR-10) [57]... Colored MNIST Dataset, proposed in [9]. |
| Dataset Splits | No | The paper refers to 'Train' and 'Test' sets and IID/OOD splits but does not explicitly detail a validation set split (e.g., percentage, sample counts, or explicit mention of 'validation' in the context of data splits). |
| Hardware Specification | No | The paper mentions 'Calcul Québec and Compute Canada for providing us with the computing resources' but does not specify any particular hardware details such as GPU models, CPU types, or memory specifications used for running experiments. |
| Software Dependencies | No | The paper does not provide specific software dependency details with version numbers (e.g., Python 3.8, PyTorch 1.9) required to replicate the experiments. |
| Experiment Setup | Yes | For more details including the scheme for hyper-parameter tuning, see App. B. ... A two-layer Re LU network with 500 hidden units is trained with cross-entropy loss for two different arrangements of the training points. ... we conduct a classification experiment on CIFAR-10, CIFAR-100, and CIFAR-2 (cats vs dogs of CIFAR-10) [57] using a convolutional network with Re LU non-linearity. |