A Probabilistic Framework for Modular Continual Learning
Authors: Lazar Valkov, Akash Srivastava, Swarat Chaudhuri, Charles Sutton
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate PICLE using two benchmark suites designed to assess different desiderata of CL techniques. Comparing to a wide range of approaches, we show that PICLE is the first modular CL algorithm to achieve perceptual, fewshot and latent transfer while scaling well to large search spaces, outperforming previous state-of-the-art modular CL approaches on long problem sequences. |
| Researcher Affiliation | Collaboration | Lazar Valkov1 , Akash Srivastava1, Swarat Chaudhuri2, Charles Sutton3 1MIT-IBM Watson AI Lab, 2UT Austin, 3University of Edinburgh |
| Pseudocode | Yes | Algorithm 1: PICLE Algorithm 2: FINDBESTPTPATH: Searching through perceptual-transfer paths Algorithm 3: FINDBESTNTPATH: Searching through latent-transfer paths |
| Open Source Code | Yes | Finally, PICLE s source code is available at https://github.com/Lazar Valkov/PICLE. |
| Open Datasets | Yes | We evaluate PICLE on the popular CTr L benchmark suite (Veniat et al., 2020), as well as a new extension of CTr L, which we call BELL... The CTr L benchmark suite was introduced in Veniat et al. (2020). They define a number of sequences, based on seven image classfication tasks, namely: CIFAR10 and CIFAR100 (Krizhevsky et al., 2009), DTD (Cimpoi et al., 2014), SVHN (Netzer et al., 2011), MNIST (Le Cun et al., 1998), Rainbow MNIST (Finn et al., 2019), and Fashion MNIST (Xiao et al., 2017). |
| Dataset Splits | Yes | For problems with Ψ , we use the triple n val = (10, 20, 10) for generating the validation dataset. For the rest of the problems, we use the triple n val = (5000, Allval, All). Finally, we generate all test datasets using the triple n test = (5000, Alltest, All). We apply early stopping, based on the validation loss. We stop after 6000 updates without improvement and return the parameters which were logged to have had the best validation accuracy during training. |
| Hardware Specification | Yes | All experiments are run on a single machine with two Tesla P100 GPUs with 16 GB VRAM, 64-core CPU of the following model: Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz , and 377 GB RAM. |
| Software Dependencies | Yes | All experiments are implemented using Py Torch 1.11.0 (Paszke et al., 2019). We also use GPy s (GPy, since 2012) implementation of a Gaussian process. |
| Experiment Setup | Yes | Our hyperparameters are listed in Appendix G. For each baseline, we assess the performance on a held-out test dataset. PICLE When searching through PT paths, we use a prior with softmax temperature (T = 0.001) for BELL and (T = 0.6247744509446062) for CTr L. When approximating a module s input distribution, we project its inputs to k = 20 dimensions. |