reproducibilityindex.ai

Synaptic Weight Distributions Depend on the Geometry of Plasticity

Authors: Roman Pogodin, Jonathan Cornford, Arna Ghosh, Gauthier Gidel, Guillaume Lajoie, Blake Aaron Richards

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Here we empirically verify our theory under conditions relevant for neuroscientific experiments. We use Py Torch Paszke et al. (2019) and FFCV library for fast data loading Leclerc et al. (2022). The experiments were performed on a local cluster with A100 NVIDIA GPUs.
Researcher Affiliation	Academia	Roman Pogodin Mc Gill & Mila roman.pogodin@mila.quebec Jonathan Cornford Mc Gill & Mila cornforj@mila.quebec Arna Ghosh Mc Gill & Mila Gauthier Gidel Universit e de Montr eal1 & Mila Guillaume Lajoie Universit e de Montr eal2 & Mila Blake Aaron Richards Mc Gill3, Mila & CIFAR4
Pseudocode	No	The paper provides mathematical derivations and equations but no explicit pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at github.com/romanpogodin/synaptic-weight-distr.
Open Datasets	Yes	We use networks pretrained on Image Net Deng et al. (2009), and finetune them to 100% accuracy on a subset of Image Net validation set. We have conducted experiments with recurrent neural networks trained on row-wise sequential MNIST Le Cun et al. (2010). In Section 4.4, we used the data from Dorkenwald et al. (2022).
Dataset Splits	Yes	We use networks pretrained on Image Net, and finetune them to 100% accuracy on a subset of Image Net validation set. The networks were trained on the train set, and then finetuned on a subset of the test set (same procedure as for deep networks) for N = D0.5 (the number of weights scales quadratically with the hidden size, so N equals the number of hidden units).
Hardware Specification	Yes	The experiments were performed on a local cluster with A100 NVIDIA GPUs.
Software Dependencies	No	We use Py Torch Paszke et al. (2019) and FFCV library for fast data loading Leclerc et al. (2022).
Experiment Setup	Yes	Learning rate was changed during learning according to the cosine annealing schedule. The initial learning rate was chosen on a single seed via grid search over 16 points (log10-space from 1e-7 to 1e-1) and 100 epochs. Networks were trained on the cross-entropy loss using stochastic mirror descent; the gradients had momentum of 0.9 but no weight decay, batch size of 256. The initial number of epochs was 30, but increased by 30 up to 4 times if the accuracy was less than 100%. The dataset was not augmented; all images were center cropped to have a resolution of 224 pixels and normalized by the standard Image Net mean/variance.