Scalable Recollections for Continual Lifelong Learning
Authors: Matthew Riemer, Tim Klinger, Djallel Bouneffouf, Michele Franceschini1352-1359
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present a novel scalable architecture and training algorithm in this challenging domain and provide an extensive evaluation of its performance. Our results show that we can achieve considerable gains on top of state-of-the-art methods such as GEM. |
| Researcher Affiliation | Industry | Matthew Riemer, Tim Klinger, Djallel Bouneffouf, Michele Franceschini IBM Research T.J. Watson Research Center, Yorktown Heights, NY {mdriemer, tklinger, djallel.bouneffouf, franceschini}@us.ibm.com |
| Pseudocode | Yes | Algorithm 1 Experience Replay Training for Continual Learning with a Scalable Recollection Module |
| Open Source Code | No | 1See an extended version of this paper including the appendix at https://arxiv.org/pdf/1711.06761.pdf. This link points to the paper itself on arXiv, not to source code. |
| Open Datasets | Yes | MNIST-Rotations: (Lopez-Paz and Ranzato 2017) A dataset with 20 tasks including 1,000 training examples for each task. Incremental CIFAR-100: (Lopez-Paz and Ranzato 2017) A continual learning split of the CIFAR-100 image classification dataset considering each of the 20 course grained labels to be a task with 2,500 examples each. Omniglot: A character recognition dataset (Lake et al. 2011) in which we consider each of the 50 alphabets to be a task. |
| Dataset Splits | No | The paper does not explicitly provide specific train/validation/test dataset splits (e.g., percentages, exact counts) or mention a validation set. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | Architecture: We model our experiments after (Lopez Paz and Ranzato 2017) and use a Resnet-18 model as Fθ for CIFAR-100 and Omniglot as well as a two layer MLP with 200 hidden units for MNIST-Rotations. Across all of our experiments, our autoencoder models include three convolutional layers in the encoder and three deconvolutional layers in the decoder. Each convolutional layer has a kernel size of 5. As we vary the size of our categorical latent variable across experiments, we in turn model the number of filters in each convolutional layer to keep the number of hidden variables consistent at all intermediate layers of the network. Module hyperparameters: In our experiments we used a binary cross entropy loss for both ℓand ℓREC. |