Model Zoo: A Growing Brain That Learns Continually
Authors: Rahul Ramesh, Pratik Chaudhari
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We use statistical learning theory and experimental analysis to show how multiple tasks can interact with each other in a non-trivial fashion when a single model is trained on them. We demonstrate that Model Zoo obtains large gains in accuracy on a wide variety of continual learning benchmark problems. We comprehensively evaluate Model Zoo on existing task-incremental continual learning benchmark problems and show comparisons with existing methods. |
| Researcher Affiliation | Academia | Rahul Ramesh & Pratik Chaudhari University of Pennsylvania {rahulram,pratikac}@seas.upenn.edu |
| Pseudocode | No | The paper describes the Model Zoo algorithm in prose and equations (e.g., Equation 8) but does not provide it in a structured pseudocode or algorithm block format. |
| Open Source Code | Yes | To ensure the reproducability of our work, the full source code is available at https://github. com/rahul13ramesh/modelzoo_continual. |
| Open Datasets | Yes | We evaluate on Rotated-MNIST (Lopez-Paz and Ranzato, 2017), Split-MNIST (Zenke et al., 2017), Permuted-MNIST (Kirkpatrick et al., 2017), Split-CIFAR10 (Zenke et al., 2017), Split-CIFAR100 (Zenke et al., 2017), Coarse-CIFAR100 (Rosenbaum et al., 2017; Yoon et al., 2019; Shanahan et al., 2021) and Split-mini Imagenet (Vinyals et al., 2016; Chaudhry et al., 2019b). |
| Dataset Splits | Yes | Split-mini Imagenet ... 20% of the samples are used as the validation set. We compare algorithms in terms of the validation accuracy averaged across all tasks at the end of all episodes... |
| Hardware Specification | Yes | All entries for inference time in Table 2 were computed by us on an Nvidia V100 GPU and therefore they can be compared directly with each other. |
| Software Dependencies | No | Ray Tune (Liaw et al., 2018) was used for hyper-parameter tuning and The Async Successive Halving Algorithm (ASHA) scheduler (Li et al., 2018) was used to prune hyper-parameter choices with the search space determined by Nevergrad (Rapin and Teytaud, 2018). The paper mentions software tools used but does not provide specific version numbers for them (e.g., "Ray Tune X.Y.Z"). |
| Experiment Setup | Yes | The final values of training hyper-parameters that were chosen are, learning-rate of 0.01, mini-batch size of 16, dropout probability of 0.2 and weight-decay of 10 5. Model Zoo uses b = min(k, 5) at each round of continual learning where n is the number of tasks; for tasks with only 5 tasks (MNIST-variants) we use b = 2. |