Generalized Variational Continual Learning
Authors: Noel Loo, Siddharth Swaroop, Richard E Turner
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, in Section 5 we test GVCL and GVCL with Fi LM layers on many standard benchmarks... |
| Researcher Affiliation | Academia | Noel Loo, Siddharth Swaroop & Richard E. Turner University of Cambridge {nl355,ss2163,ret26}@cam.ac.uk |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/yolky/gvcl |
| Open Datasets | Yes | It is derived from the HASYv2 dataset (Thoma, 2017)... For our Split-MNIST experiment, in addition to the standard 5 binary classification tasks for Split MNIST, we add 5 more binary classification tasks by taking characters from the KMNIST dataset (Clanuwat et al., 2018)... The popular Split-CIFAR dataset, introduced in Zenke et al. (2017)... |
| Dataset Splits | Yes | Early stopping based on the validation set was used. 10% of the training set was used as validation for these methods, and for Easy and Hard CHASY, 8 samples per class form the validation set (which are disjoint from the training samples or test samples). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions using a Github repository for HAT but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | Baseline MAP algorithms were trained with SGD with a decaying learning starting at 5e-2 with a maximum epochs of 200 per task... For VI models, we used Adam optimizer with a learning rate of 1e-4 for Split-MNIST and Mixture, and 1e-3 for Easy-CHASY, Hard-CHASY and Split-CIFAR... All experiments (both the baselines and VI methods) use a batch size of 64... Table 3: Best (selected) hyperparameters for continual learning experiments for various algorithms. |