Model Zoo: A Growing Brain That Learns Continually

Authors: Rahul Ramesh, Pratik Chaudhari

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We use statistical learning theory and experimental analysis to show how multiple tasks can interact with each other in a non-trivial fashion when a single model is trained on them. We demonstrate that Model Zoo obtains large gains in accuracy on a wide variety of continual learning benchmark problems. We comprehensively evaluate Model Zoo on existing task-incremental continual learning benchmark problems and show comparisons with existing methods.
Researcher Affiliation Academia Rahul Ramesh & Pratik Chaudhari University of Pennsylvania {rahulram,pratikac}@seas.upenn.edu
Pseudocode No The paper describes the Model Zoo algorithm in prose and equations (e.g., Equation 8) but does not provide it in a structured pseudocode or algorithm block format.
Open Source Code Yes To ensure the reproducability of our work, the full source code is available at https://github. com/rahul13ramesh/modelzoo_continual.
Open Datasets Yes We evaluate on Rotated-MNIST (Lopez-Paz and Ranzato, 2017), Split-MNIST (Zenke et al., 2017), Permuted-MNIST (Kirkpatrick et al., 2017), Split-CIFAR10 (Zenke et al., 2017), Split-CIFAR100 (Zenke et al., 2017), Coarse-CIFAR100 (Rosenbaum et al., 2017; Yoon et al., 2019; Shanahan et al., 2021) and Split-mini Imagenet (Vinyals et al., 2016; Chaudhry et al., 2019b).
Dataset Splits Yes Split-mini Imagenet ... 20% of the samples are used as the validation set. We compare algorithms in terms of the validation accuracy averaged across all tasks at the end of all episodes...
Hardware Specification Yes All entries for inference time in Table 2 were computed by us on an Nvidia V100 GPU and therefore they can be compared directly with each other.
Software Dependencies No Ray Tune (Liaw et al., 2018) was used for hyper-parameter tuning and The Async Successive Halving Algorithm (ASHA) scheduler (Li et al., 2018) was used to prune hyper-parameter choices with the search space determined by Nevergrad (Rapin and Teytaud, 2018). The paper mentions software tools used but does not provide specific version numbers for them (e.g., "Ray Tune X.Y.Z").
Experiment Setup Yes The final values of training hyper-parameters that were chosen are, learning-rate of 0.01, mini-batch size of 16, dropout probability of 0.2 and weight-decay of 10 5. Model Zoo uses b = min(k, 5) at each round of continual learning where n is the number of tasks; for tasks with only 5 tasks (MNIST-variants) we use b = 2.