Efficient Continual Learning with Modular Networks and Task-Driven Priors

Authors: Tom Veniat, Ludovic Denoyer, MarcAurelio Ranzato

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that this modular architecture and learning algorithm perform competitively on widely used CL benchmarks while yielding superior performance on the more challenging benchmarks we introduce in this work.
Researcher Affiliation Collaboration Tom Veniat LIP6, Sorbonne Universit e, France tom.veniat@lip6.fr Ludovic Denoyer & Marc Aurelio Ranzato Facebook Artificial Intelligence Research {denoyer,ranzato}@fb.com
Pseudocode Yes Algorithm 1: MNTDP-S algorithm. [...] Algorithm 2: MNTDP-D algorithm.
Open Source Code Yes Pytorch implementation of the experiments available here: https://github.com/Tom Veniat/MNTDP.
Open Datasets Yes The CTr L (Continual Transfer Learning) benchmark is a collection of streams of tasks built over seven popular computer vision datasets, namely: CIFAR10 and CIFAR100 (Krizhevsky, 2009), DTD (Cimpoi et al., 2014), SVHN (Netzer et al., 2011), MNIST (Le Cun et al., 1998), Rainbow MNIST (Finn et al., 2019) and Fashion MNIST (Xiao et al., 2017);
Dataset Splits Yes Each task consists of a training, validation, and test datasets corresponding to a 5-way and 10-way classification problem for the transfer streams and the long stream, respectively.
Hardware Specification Yes To match the capacity of MNTDP, we scale HAT s backbone to the maximal size that can fit in a Titan X GPU Memory (6.5x, wide version).
Software Dependencies No The information is insufficient. The paper mentions using the 'Adam optimizer' and 'Pytorch implementation' (in the code link text), but it does not provide specific version numbers for these or other key software components (e.g., PyTorch version, Python version, CUDA version).
Experiment Setup Yes For all methods and experiments, we use the Adam optimizer (Kingma & Ba, 2015) with β1 = 0.9, β2 = 0.999 and ϵ = 10 8. For each task and each baseline, two learning rates {10 2, 10 3} and 3 weight decay strengths {0, 10 5, 10 4} are considered.