Sparse Distributed Memory is a Continual Learner
Authors: Trenton Bricken, Xander Davies, Deepak Singh, Dmitry Krotov, Gabriel Kreiman
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental Setup Trying to make the continual learning setting as realistic as possible, we use Split CIFAR10 in the class incremental setting with pretraining on Image Net (Russakovsky et al., 2015). This splits CIFAR10 into disjoint subsets that each contain two of the classes. For example the first data split contains classes 5 and 2 the second split contains classes 7 and 9, etc. CIFAR is more complex than MNIST and captures real-world statistical properties of images. The class incremental setting is more difficult than incremental task learning because predictions are made for every CIFAR class instead of just between the two classes in the current task (Hsu et al., 2018; Farquhar & Gal, 2018). Pretraining on Image Net enables learning general image statistics and is when the GABA switch happens, allowing neurons to specialize and spread across the data manifold. In the main text, we present results where our Image Net32 and CIFAR datasets have been compressed into 256 dimensional latent embeddings taken from the last layer of a frozen Conv Mixer that was pre-trained on Image Net32 (step #1 of our training regime in Fig. 2) (Trockman & Kolter, 2022; Russakovsky et al., 2015). |
| Researcher Affiliation | Collaboration | Trenton Bricken Systems, Synthetic, and Quantitative Biology Harvard University trentonbricken@g.harvard.edu Xander Davies & Deepak Singh Computer Science Harvard College {alexander davies, tejasvisingh}@college.harvard.edu Dmitry Krotov MIT-IBM Watson AI Lab IBM Research krotov@ibm.com Gabriel Kreiman Harvard Medical School Programs in Biophysics and Neuroscience Gabriel.Kreiman@childrens.harvard.edu |
| Pseudocode | Yes | A.1 SDM TRAINING ALGORITHM Algorithm 1 SDMLP Training Algorithm |
| Open Source Code | Yes | All of our code and training parameters can be found in our publicly available codebase: https:// github.com/Trent Brick/SDMContinual Learner. |
| Open Datasets | Yes | Experimental Setup Trying to make the continual learning setting as realistic as possible, we use Split CIFAR10 in the class incremental setting with pretraining on Image Net (Russakovsky et al., 2015). |
| Dataset Splits | Yes | The CIFAR dataset is split into disjoint sets using five different random seeds to ensure our results are independent of both data split ordering and the classes each split contains. |
| Hardware Specification | No | The paper mentions 'GPU time allocated' but does not specify any particular GPU models, CPU models, or other detailed hardware specifications used for running the experiments. |
| Software Dependencies | No | The Acknowledgements section lists several open-source software libraries (Numpy, Pandas, Scipy, Matplotlib, Py Torch, and Anaconda) but does not provide specific version numbers for any of them. |
| Experiment Setup | Yes | All models are MLPs with 1,000 neurons in a single hidden layer unless otherwise indicated. When using the Top-K activation function, we set ktarget = 10 and also present ktarget = 1. We tested additional k values and suggest how to choose the best ktarget value in App. C.2. Because the k values considered are highly sparse saving on FLOPs and memory consumption we also evaluate the 10,000 neuron setting which improves SDM and the Fly Model continual learning abilities in particular. ... We train for 2,000 epochs on each task and 10,000 epochs in total. |