Continual Learning Through Synaptic Intelligence
Authors: Friedemann Zenke, Ben Poole, Surya Ganguli
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5. Experiments We evaluated our approach for continual learning on the split and permuted MNIST (Le Cun et al., 1998; Goodfellow et al., 2013), and split versions of CIFAR-10 and CIFAR-100 (Krizhevsky & Hinton, 2009). |
| Researcher Affiliation | Academia | 1Stanford University. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | No explicit statement or link providing access to source code for the methodology was found. |
| Open Datasets | Yes | We evaluated our approach for continual learning on the split and permuted MNIST (Le Cun et al., 1998; Goodfellow et al., 2013), and split versions of CIFAR-10 and CIFAR-100 (Krizhevsky & Hinton, 2009). |
| Dataset Splits | Yes | However, here we used ξ = 0.1 and the value for c = 0.1 was determined via a coarse grid search on a heldout validation set. |
| Hardware Specification | No | No specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running experiments were mentioned. |
| Software Dependencies | No | The paper mentions using 'Adam (Kingma & Ba, 2014)' but does not specify software versions for programming languages, libraries, or other dependencies. |
| Experiment Setup | Yes | We used a small multi-layer perceptron (MLP) with only two hidden layers consisting of 256 units each with Re LU nonlinearities, and a standard categorical cross-entropy loss function plus our consolidation cost term (with damping parameter ξ = 1 10 3). To avoid the complication of crosstalk between digits at the readout layer due to changes in the label distribution during training, we used a multi-head approach in which the categorical cross entropy loss at the readout layer was computed only for the digits present in the current task. Finally, we optimized our network using a minibatch size of 64 and trained for 10 epochs. To achieve good absolute performance with a smaller number of epochs we used the adaptive optimizer Adam (η = 1 10 3, β1 = 0.9, β2 = 0.999). |