Can Forward Gradient Match Backpropagation?
Authors: Louis Fournier, Stephane Rivaud, Eugene Belilovsky, Michael Eickenberg, Edouard Oyallon
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We now describe how we implemented our models, training procedure, and the implementations of Gradient Targets and Guesses to study the accuracy of a given model under the variations of those parameters. |
| Researcher Affiliation | Academia | 1Sorbonne Universit e, CNRS, ISIR, Paris, France 2CCM, Flatiron Institute, New York, USA 3MILA, Concordia University, Montr eal, Canada. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our source code is available at: github.com/streethagore/Forward Local Gradient. |
| Open Datasets | Yes | We considered the CIFAR-10 and Image Net32 datasets, used with standard data augmentation. and Chrabaszcz et al. (2017) has demonstrated that, in general, the conclusions drawn from the Image Net32 dataset are also applicable to the full-resolution Image Net dataset. |
| Dataset Splits | No | The paper mentions using 'validation accuracy' and 'cross-validate' but does not provide specific details on the dataset splits (percentages or counts) or cite a specific predefined split that defines these. |
| Hardware Specification | No | The paper mentions using 'AI resources of IDRIS' and 'resources from Compute Canada and Calcul Quebec' but does not specify exact hardware models such as GPU or CPU types. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers, such as library or framework versions. |
| Experiment Setup | Yes | We followed a standard training procedure: SGD with a momentum of 0.9 and weight decay of 5 10 4. For CIFAR-10, we train the model for 100 epochs, with a learning rate decayed by 0.2 every 30 epochs. For Image Net32, we also first try a shorter training of 70 epochs, decaying the learning rate by 0.1 every 20 epochs. The initial learning rate was chosen among {0.05, 0.01, 0.005} for CIFAR-10 and {0.1, 0.05, 0.01, 0.005, 0.0001} for Image Net32. |