Synaptic Weight Distributions Depend on the Geometry of Plasticity
Authors: Roman Pogodin, Jonathan Cornford, Arna Ghosh, Gauthier Gidel, Guillaume Lajoie, Blake Aaron Richards
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Here we empirically verify our theory under conditions relevant for neuroscientific experiments. We use Py Torch Paszke et al. (2019) and FFCV library for fast data loading Leclerc et al. (2022). The experiments were performed on a local cluster with A100 NVIDIA GPUs. |
| Researcher Affiliation | Academia | Roman Pogodin Mc Gill & Mila roman.pogodin@mila.quebec Jonathan Cornford Mc Gill & Mila cornforj@mila.quebec Arna Ghosh Mc Gill & Mila Gauthier Gidel Universit e de Montr eal1 & Mila Guillaume Lajoie Universit e de Montr eal2 & Mila Blake Aaron Richards Mc Gill3, Mila & CIFAR4 |
| Pseudocode | No | The paper provides mathematical derivations and equations but no explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at github.com/romanpogodin/synaptic-weight-distr. |
| Open Datasets | Yes | We use networks pretrained on Image Net Deng et al. (2009), and finetune them to 100% accuracy on a subset of Image Net validation set. We have conducted experiments with recurrent neural networks trained on row-wise sequential MNIST Le Cun et al. (2010). In Section 4.4, we used the data from Dorkenwald et al. (2022). |
| Dataset Splits | Yes | We use networks pretrained on Image Net, and finetune them to 100% accuracy on a subset of Image Net validation set. The networks were trained on the train set, and then finetuned on a subset of the test set (same procedure as for deep networks) for N = D0.5 (the number of weights scales quadratically with the hidden size, so N equals the number of hidden units). |
| Hardware Specification | Yes | The experiments were performed on a local cluster with A100 NVIDIA GPUs. |
| Software Dependencies | No | We use Py Torch Paszke et al. (2019) and FFCV library for fast data loading Leclerc et al. (2022). |
| Experiment Setup | Yes | Learning rate was changed during learning according to the cosine annealing schedule. The initial learning rate was chosen on a single seed via grid search over 16 points (log10-space from 1e-7 to 1e-1) and 100 epochs. Networks were trained on the cross-entropy loss using stochastic mirror descent; the gradients had momentum of 0.9 but no weight decay, batch size of 256. The initial number of epochs was 30, but increased by 30 up to 4 times if the accuracy was less than 100%. The dataset was not augmented; all images were center cropped to have a resolution of 224 pixels and normalized by the standard Image Net mean/variance. |