Differentiable plasticity: training plastic neural networks with backpropagation
Authors: Thomas Miconi, Kenneth Stanley, Jeff Clune
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that plasticity, just like connection weights, can be optimized by gradient descent in large (millions of parameters) recurrent networks with Hebbian plastic connections. First, recurrent plastic networks with more than two million parameters can be trained to memorize and reconstruct sets of novel, high-dimensional (1,000+ pixels) natural images not seen during training. Crucially, traditional non-plastic recurrent networks fail to solve this task. Furthermore, trained plastic networks can also solve generic meta-learning tasks such as the Omniglot task, with competitive results and little parameter overhead. Finally, in reinforcement learning settings, plastic networks outperform a non-plastic equivalent in a maze exploration task. |
| Researcher Affiliation | Industry | Uber AI Labs. Correspondence to: Thomas Miconi <tmiconi@uber.com>. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks in the main text. |
| Open Source Code | Yes | The code for all experiments described in this paper is available at https://github.com/ uber-common/differentiable-plasticity |
| Open Datasets | Yes | Images are from the CIFAR-10 database, which contains 60,000 images of size 32 by 32 pixels (i.e. 1,024 pixels in total), converted to grayscale pixels between 0 and 1.0. |
| Dataset Splits | No | The paper mentions training and test sets (e.g., '1,523 classes for training and 100 classes... for testing' for Omniglot) but does not provide specific train/validation/test splits with percentages or counts for all experiments. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models or processor types used for running experiments. |
| Software Dependencies | No | All experiments reported here use the Py Torch package to compute gradients. However, no specific version number for PyTorch or other software dependencies is provided. |
| Experiment Setup | Yes | The gradient of this error over the wi,j and αi,j coefficients is then computed by backpropagation, and these coefficients are optimized through an Adam solver (Kingma & Ba, 2015) with learning rate 0.001. |