Efficient Continual Learning with Modular Networks and Task-Driven Priors
Authors: Tom Veniat, Ludovic Denoyer, MarcAurelio Ranzato
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that this modular architecture and learning algorithm perform competitively on widely used CL benchmarks while yielding superior performance on the more challenging benchmarks we introduce in this work. |
| Researcher Affiliation | Collaboration | Tom Veniat LIP6, Sorbonne Universit e, France tom.veniat@lip6.fr Ludovic Denoyer & Marc Aurelio Ranzato Facebook Artificial Intelligence Research {denoyer,ranzato}@fb.com |
| Pseudocode | Yes | Algorithm 1: MNTDP-S algorithm. [...] Algorithm 2: MNTDP-D algorithm. |
| Open Source Code | Yes | Pytorch implementation of the experiments available here: https://github.com/Tom Veniat/MNTDP. |
| Open Datasets | Yes | The CTr L (Continual Transfer Learning) benchmark is a collection of streams of tasks built over seven popular computer vision datasets, namely: CIFAR10 and CIFAR100 (Krizhevsky, 2009), DTD (Cimpoi et al., 2014), SVHN (Netzer et al., 2011), MNIST (Le Cun et al., 1998), Rainbow MNIST (Finn et al., 2019) and Fashion MNIST (Xiao et al., 2017); |
| Dataset Splits | Yes | Each task consists of a training, validation, and test datasets corresponding to a 5-way and 10-way classification problem for the transfer streams and the long stream, respectively. |
| Hardware Specification | Yes | To match the capacity of MNTDP, we scale HAT s backbone to the maximal size that can fit in a Titan X GPU Memory (6.5x, wide version). |
| Software Dependencies | No | The information is insufficient. The paper mentions using the 'Adam optimizer' and 'Pytorch implementation' (in the code link text), but it does not provide specific version numbers for these or other key software components (e.g., PyTorch version, Python version, CUDA version). |
| Experiment Setup | Yes | For all methods and experiments, we use the Adam optimizer (Kingma & Ba, 2015) with β1 = 0.9, β2 = 0.999 and ϵ = 10 8. For each task and each baseline, two learning rates {10 2, 10 3} and 3 weight decay strengths {0, 10 5, 10 4} are considered. |