reproducibilityindex.ai

A Probabilistic Framework for Modular Continual Learning

Authors: Lazar Valkov, Akash Srivastava, Swarat Chaudhuri, Charles Sutton

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate PICLE using two benchmark suites designed to assess different desiderata of CL techniques. Comparing to a wide range of approaches, we show that PICLE is the first modular CL algorithm to achieve perceptual, fewshot and latent transfer while scaling well to large search spaces, outperforming previous state-of-the-art modular CL approaches on long problem sequences.
Researcher Affiliation	Collaboration	Lazar Valkov1 , Akash Srivastava1, Swarat Chaudhuri2, Charles Sutton3 1MIT-IBM Watson AI Lab, 2UT Austin, 3University of Edinburgh
Pseudocode	Yes	Algorithm 1: PICLE Algorithm 2: FINDBESTPTPATH: Searching through perceptual-transfer paths Algorithm 3: FINDBESTNTPATH: Searching through latent-transfer paths
Open Source Code	Yes	Finally, PICLE s source code is available at https://github.com/Lazar Valkov/PICLE.
Open Datasets	Yes	We evaluate PICLE on the popular CTr L benchmark suite (Veniat et al., 2020), as well as a new extension of CTr L, which we call BELL... The CTr L benchmark suite was introduced in Veniat et al. (2020). They define a number of sequences, based on seven image classfication tasks, namely: CIFAR10 and CIFAR100 (Krizhevsky et al., 2009), DTD (Cimpoi et al., 2014), SVHN (Netzer et al., 2011), MNIST (Le Cun et al., 1998), Rainbow MNIST (Finn et al., 2019), and Fashion MNIST (Xiao et al., 2017).
Dataset Splits	Yes	For problems with Ψ , we use the triple n val = (10, 20, 10) for generating the validation dataset. For the rest of the problems, we use the triple n val = (5000, Allval, All). Finally, we generate all test datasets using the triple n test = (5000, Alltest, All). We apply early stopping, based on the validation loss. We stop after 6000 updates without improvement and return the parameters which were logged to have had the best validation accuracy during training.
Hardware Specification	Yes	All experiments are run on a single machine with two Tesla P100 GPUs with 16 GB VRAM, 64-core CPU of the following model: Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz , and 377 GB RAM.
Software Dependencies	Yes	All experiments are implemented using Py Torch 1.11.0 (Paszke et al., 2019). We also use GPy s (GPy, since 2012) implementation of a Gaussian process.
Experiment Setup	Yes	Our hyperparameters are listed in Appendix G. For each baseline, we assess the performance on a held-out test dataset. PICLE When searching through PT paths, we use a prior with softmax temperature (T = 0.001) for BELL and (T = 0.6247744509446062) for CTr L. When approximating a module s input distribution, we project its inputs to k = 20 dimensions.