Scalable Recollections for Continual Lifelong Learning

Authors: Matthew Riemer, Tim Klinger, Djallel Bouneffouf, Michele Franceschini1352-1359

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present a novel scalable architecture and training algorithm in this challenging domain and provide an extensive evaluation of its performance. Our results show that we can achieve considerable gains on top of state-of-the-art methods such as GEM.
Researcher Affiliation Industry Matthew Riemer, Tim Klinger, Djallel Bouneffouf, Michele Franceschini IBM Research T.J. Watson Research Center, Yorktown Heights, NY {mdriemer, tklinger, djallel.bouneffouf, franceschini}@us.ibm.com
Pseudocode Yes Algorithm 1 Experience Replay Training for Continual Learning with a Scalable Recollection Module
Open Source Code No 1See an extended version of this paper including the appendix at https://arxiv.org/pdf/1711.06761.pdf. This link points to the paper itself on arXiv, not to source code.
Open Datasets Yes MNIST-Rotations: (Lopez-Paz and Ranzato 2017) A dataset with 20 tasks including 1,000 training examples for each task. Incremental CIFAR-100: (Lopez-Paz and Ranzato 2017) A continual learning split of the CIFAR-100 image classification dataset considering each of the 20 course grained labels to be a task with 2,500 examples each. Omniglot: A character recognition dataset (Lake et al. 2011) in which we consider each of the 50 alphabets to be a task.
Dataset Splits No The paper does not explicitly provide specific train/validation/test dataset splits (e.g., percentages, exact counts) or mention a validation set.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes Architecture: We model our experiments after (Lopez Paz and Ranzato 2017) and use a Resnet-18 model as Fθ for CIFAR-100 and Omniglot as well as a two layer MLP with 200 hidden units for MNIST-Rotations. Across all of our experiments, our autoencoder models include three convolutional layers in the encoder and three deconvolutional layers in the decoder. Each convolutional layer has a kernel size of 5. As we vary the size of our categorical latent variable across experiments, we in turn model the number of filters in each convolutional layer to keep the number of hidden variables consistent at all intermediate layers of the network. Module hyperparameters: In our experiments we used a binary cross entropy loss for both ℓand ℓREC.