Continual Learning via Sequential Function-Space Variational Inference
Authors: Tim G. J. Rudner, Freddie Bickford Smith, Qixuan Feng, Yee Whye Teh, Yarin Gal
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that S-FSVI outperforms existing objectivebased continual learning methods in some cases by a significant margin on a wide range of task sequences, including single-head split MNIST, multi-head split CIFAR, and multi-head sequential Omniglot. We further present empirical results that showcase the usefulness of learned variational variance parameters and demonstrate that S-FSVI is less reliant on careful selection of datapoints that summarize past tasks than other methods. |
| Researcher Affiliation | Academia | Tim G. J. Rudner 1 Freddie Bickford Smith 1 Qixuan Feng 1 Yee Whye Teh 1 Yarin Gal 1 University of Oxford, Oxford, UK. Correspondence to: Tim G. J. Rudner <tim.rudner@cs.ox.ac.uk>. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code can be accessed at: https://timrudner.com/sfsvi-code. |
| Open Datasets | Yes | First is split MNIST (S-MNIST), in which each task consists of binary classification on a pair of MNIST classes (0 vs. 1, 2 vs. 3, and so on). Second is split Fashion MNIST (S-FMNIST), which has the same structure but uses data from Fashion MNIST, posing a harder problem. Third is permuted MNIST (P-MNIST), in which each task consists of ten-way classification on MNIST images whose pixels have been randomly reordered. ... Sequential Omniglot (Lake et al., 2015; Schwarz et al., 2018) ... split CIFAR (Pan et al., 2020; Zenke et al., 2017). |
| Dataset Splits | Yes | The prior variance is optimized via hyperparameter selection on a validation set. ... For S-FSVI (optimized) in Table 1, we used the optimized hyperparameters chosen on a validation set after exploring the configurations shown in Table 4. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment. |
| Experiment Setup | Yes | We use the Adam optimizer with an initial learning rate of 0.0005 (β1 = 0.9, β2 = 0.999) and a batch size of 128. ... We set the prior covariance as Σ0 = 0.1 and train the neural network for 250 epochs on each task. ... The number of epochs on each task is 60 for split MNIST (MH), 60 for split Fashion MNIST (MH), 10 for permuted MNIST (SH) and 80 for split MNIST (SH). |