reproducibilityindex.ai

Continual Learning via Sequential Function-Space Variational Inference

Authors: Tim G. J. Rudner, Freddie Bickford Smith, Qixuan Feng, Yee Whye Teh, Yarin Gal

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate that S-FSVI outperforms existing objectivebased continual learning methods in some cases by a signiﬁcant margin on a wide range of task sequences, including single-head split MNIST, multi-head split CIFAR, and multi-head sequential Omniglot. We further present empirical results that showcase the usefulness of learned variational variance parameters and demonstrate that S-FSVI is less reliant on careful selection of datapoints that summarize past tasks than other methods.
Researcher Affiliation	Academia	Tim G. J. Rudner 1 Freddie Bickford Smith 1 Qixuan Feng 1 Yee Whye Teh 1 Yarin Gal 1 University of Oxford, Oxford, UK. Correspondence to: Tim G. J. Rudner <tim.rudner@cs.ox.ac.uk>.
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our code can be accessed at: https://timrudner.com/sfsvi-code.
Open Datasets	Yes	First is split MNIST (S-MNIST), in which each task consists of binary classiﬁcation on a pair of MNIST classes (0 vs. 1, 2 vs. 3, and so on). Second is split Fashion MNIST (S-FMNIST), which has the same structure but uses data from Fashion MNIST, posing a harder problem. Third is permuted MNIST (P-MNIST), in which each task consists of ten-way classiﬁcation on MNIST images whose pixels have been randomly reordered. ... Sequential Omniglot (Lake et al., 2015; Schwarz et al., 2018) ... split CIFAR (Pan et al., 2020; Zenke et al., 2017).
Dataset Splits	Yes	The prior variance is optimized via hyperparameter selection on a validation set. ... For S-FSVI (optimized) in Table 1, we used the optimized hyperparameters chosen on a validation set after exploring the conﬁgurations shown in Table 4.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment.
Experiment Setup	Yes	We use the Adam optimizer with an initial learning rate of 0.0005 (β1 = 0.9, β2 = 0.999) and a batch size of 128. ... We set the prior covariance as Σ0 = 0.1 and train the neural network for 250 epochs on each task. ... The number of epochs on each task is 60 for split MNIST (MH), 60 for split Fashion MNIST (MH), 10 for permuted MNIST (SH) and 80 for split MNIST (SH).