Attention-Gated Brain Propagation: How the brain can implement reward-based error backpropagation

Authors: Isabella Pozzi, Sander Bohte, Pieter Roelfsema

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate successful learning of deep fully connected, convolutional and locally connected networks on classical and hard image-classification benchmarks; MNIST, CIFAR10, CIFAR100 and Tiny Image Net. Brain Prop achieves an accuracy that is equivalent to that of standard error-backpropagation, and better than state-of-the-art biologically inspired learning schemes.
Researcher Affiliation Academia Isabella Pozzi Machine Learning Group Centrum Wiskunde & Informatica Amsterdam, The Netherlands isabella.pozzi@cwi.nl Sander M. Bohté Machine Learning Group Centrum Wiskunde & Informatica Amsterdam, The Netherlands s.m.bohte@cwi.nl Pieter R. Roelfsema Vision & Cognition Group Netherlands Institute for Neuroscience Amsterdam, The Netherlands p.roelfsema@nin.knaw.nl
Pseudocode No The paper does not contain pseudocode or a clearly labeled algorithm block.
Open Source Code Yes The code and selected pre-trained models are available at https://github.com/isapome/BrainProp.
Open Datasets Yes We evaluated the performance of Brain Prop on the MNIST, CIFAR10, CIFAR100 and Tiny Image Net [42] data sets. The MNIST dataset consists of 60,000 training samples (i.e. images of 28 by 28 pixels), while the CIFAR datasets comprise 50,000 training samples (RGB images of 32 by 32 pixels) and Tiny Image Net has 100,000 images (of 64 by 64 pixels) equally divided across 200 classes.
Dataset Splits No At the end of each epoch, a validation accuracy was calculated on the validation dataset. We used an early stopping criterion and stopped training if the validation accuracy had not increased for 45 consecutive epochs (by the third decimal).
Hardware Specification No The paper mentions only 'GPU memory' without specifying any particular GPU model, CPU, or other hardware used for experiments.
Software Dependencies No The paper does not provide specific version numbers for software dependencies or libraries used (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes For the fully connected network we used a schedule with a learning rate starting 1 which was halved every 100 epochs. The weights were randomly initialized from a normal distribution with a zero mean and 0.1 standard deviation. For all the other experiments presented in this paper we used a learning rate of 0.1 as a starting value for the schedule and a standard deviation of 0.005 for the weight initialization. To fit into GPU memory, we used a batch size of 32 for locally connected networks while for all the other experiments we used a batch size of 128. We used the same hyperparameters as in the more shallow networks, but added L2 regularization of 0.0005.