Variational Auto-Regressive Gaussian Processes for Continual Learning
Authors: Sanyam Kapoor, Theofanis Karaletsos, Thang D Bui
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5. Experiments Through our experiments, we highlight the qualitative characteristics of the derived generalized learning objective (8), and provide evidence for the competitiveness of VAR-GPs compared to our main baseline Improved Variational Continual Learning (VCL) (Nguyen et al., 2018; Swaroop et al., 2019), among others. A thorough ablation study demonstrates the efficacy of our modeling choices. |
| Researcher Affiliation | Collaboration | 1Center for Data Science, New York University, New York, NY, USA 2Facebook Inc., Menlo Park, CA, USA 3University of Sydney, Sydney, NSW, Australia. |
| Pseudocode | Yes | Algorithm 1 VAR-GP per-task training |
| Open Source Code | Yes | The full reference implementation of VAR-GPs in Py Torch (Paszke et al., 2019) in publicly available at u.perhapsbay.es/vargp-code. |
| Open Datasets | Yes | Split MNIST Following Zenke et al. (2017), we consider the full 10-way classification task at each time step but receive a dataset D(t) of only a subset of MNIST digits in the sequence 0/1, 2/3, 4/5, 6/7, and 8/9. ... Permuted MNIST In this benchmark, we receive a dataset D(t) of MNIST digits at each time step t, such that the pixels undergo an unknown but fixed permutation. |
| Dataset Splits | Yes | We track validation accuracy on a subset of the training set for early stopping. ... 10000 training samples are cumulatively set aside for validation set across all tasks. ... 10000 samples are set aside for validation. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions 'Py Torch' and 'Yogi optimizer' but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | To optimize both variational objectives (5) and (8), we use a mini-batch size of 512 and 3 samples from variational distribution with the Yogi optimizer (Zaheer et al., 2018). ... We allocate 60 inducing points for each task, with a learning rate of 0.003, and β = 10.0. ... We allocate 100 inducing points for each task, with a learning rate of 0.0037, and β = 1.64. |