Continual Learning in Low-rank Orthogonal Subspaces
Authors: Arslan Chaudhry, Naeemullah Khan, Puneet Dokania, Philip Torr
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We report, for the first time, strong results over experience-replay baseline with and without memory on standard classification benchmarks in continual learning.1 |
| Researcher Affiliation | Collaboration | Arslan Chaudhry1, Naeemullah Khan1, Puneet K. Dokania1,2, Philip H. S. Torr1 University of Oxford1 & Five AI Ltd., UK2 |
| Pseudocode | Yes | Algorithm 1 Training of ORTHOG-SUBSPACE on sequential data D = {D1, , DT }, with Θ = {Wl}L l=1 initialized as orthonormalized matrices, P = {P1, , PT } orthogonal projections, learning rate α , s = 2, q = 0.5, ϵ = 10 8. |
| Open Source Code | Yes | Code: https://github.com/arslan-chaudhry/orthog_subspace |
| Open Datasets | Yes | Permuted MNIST is a variant of the MNIST dataset of handwritten digits [Le Cun, 1998] where each task applies a fixed random pixel permutation to the original dataset. ... Split CIFAR is a variant of the CIFAR-100 dataset [Krizhevsky and Hinton, 2009, Zenke et al., 2017] ... Split mini Image Net is a variant of the Image Net dataset [Russakovsky et al., 2015, Vinyals et al., 2016] |
| Dataset Splits | No | The paper describes using initial tasks for hyper-parameter tuning but does not provide specific train/validation/test dataset splits (percentages or sample counts) for the individual datasets like MNIST, CIFAR, or ImageNet. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory amounts) used for running experiments are provided in the paper. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions or other libraries). |
| Experiment Setup | Yes | All baselines use the same neural network architectures: a fully-connected network with two hidden layers of 256 Re LU neurons in the MNIST experiments, and a standard Res Net18 [He et al., 2016] in CIFAR and Image Net experiments. ... Batch size is set to 10 across experiments and models. A tiny ring memory of 1 example per class per task is stored for the memory-based methods. ... All experiments run for five different random seeds, each corresponding to a different dataset ordering among tasks, that are fixed across baselines. |