reproducibilityindex.ai

Efficient Continual Learning with Modular Networks and Task-Driven Priors

Authors: Tom Veniat, Ludovic Denoyer, MarcAurelio Ranzato

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that this modular architecture and learning algorithm perform competitively on widely used CL benchmarks while yielding superior performance on the more challenging benchmarks we introduce in this work.
Researcher Affiliation	Collaboration	Tom Veniat LIP6, Sorbonne Universit e, France tom.veniat@lip6.fr Ludovic Denoyer & Marc Aurelio Ranzato Facebook Artiﬁcial Intelligence Research {denoyer,ranzato}@fb.com
Pseudocode	Yes	Algorithm 1: MNTDP-S algorithm. [...] Algorithm 2: MNTDP-D algorithm.
Open Source Code	Yes	Pytorch implementation of the experiments available here: https://github.com/Tom Veniat/MNTDP.
Open Datasets	Yes	The CTr L (Continual Transfer Learning) benchmark is a collection of streams of tasks built over seven popular computer vision datasets, namely: CIFAR10 and CIFAR100 (Krizhevsky, 2009), DTD (Cimpoi et al., 2014), SVHN (Netzer et al., 2011), MNIST (Le Cun et al., 1998), Rainbow MNIST (Finn et al., 2019) and Fashion MNIST (Xiao et al., 2017);
Dataset Splits	Yes	Each task consists of a training, validation, and test datasets corresponding to a 5-way and 10-way classiﬁcation problem for the transfer streams and the long stream, respectively.
Hardware Specification	Yes	To match the capacity of MNTDP, we scale HAT s backbone to the maximal size that can ﬁt in a Titan X GPU Memory (6.5x, wide version).
Software Dependencies	No	The information is insufficient. The paper mentions using the 'Adam optimizer' and 'Pytorch implementation' (in the code link text), but it does not provide specific version numbers for these or other key software components (e.g., PyTorch version, Python version, CUDA version).
Experiment Setup	Yes	For all methods and experiments, we use the Adam optimizer (Kingma & Ba, 2015) with β1 = 0.9, β2 = 0.999 and ϵ = 10 8. For each task and each baseline, two learning rates {10 2, 10 3} and 3 weight decay strengths {0, 10 5, 10 4} are considered.