Continual Learning in Low-rank Orthogonal Subspaces

Authors: Arslan Chaudhry, Naeemullah Khan, Puneet Dokania, Philip Torr

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We report, for the first time, strong results over experience-replay baseline with and without memory on standard classification benchmarks in continual learning.1
Researcher Affiliation Collaboration Arslan Chaudhry1, Naeemullah Khan1, Puneet K. Dokania1,2, Philip H. S. Torr1 University of Oxford1 & Five AI Ltd., UK2
Pseudocode Yes Algorithm 1 Training of ORTHOG-SUBSPACE on sequential data D = {D1, , DT }, with Θ = {Wl}L l=1 initialized as orthonormalized matrices, P = {P1, , PT } orthogonal projections, learning rate α , s = 2, q = 0.5, ϵ = 10 8.
Open Source Code Yes Code: https://github.com/arslan-chaudhry/orthog_subspace
Open Datasets Yes Permuted MNIST is a variant of the MNIST dataset of handwritten digits [Le Cun, 1998] where each task applies a fixed random pixel permutation to the original dataset. ... Split CIFAR is a variant of the CIFAR-100 dataset [Krizhevsky and Hinton, 2009, Zenke et al., 2017] ... Split mini Image Net is a variant of the Image Net dataset [Russakovsky et al., 2015, Vinyals et al., 2016]
Dataset Splits No The paper describes using initial tasks for hyper-parameter tuning but does not provide specific train/validation/test dataset splits (percentages or sample counts) for the individual datasets like MNIST, CIFAR, or ImageNet.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory amounts) used for running experiments are provided in the paper.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions or other libraries).
Experiment Setup Yes All baselines use the same neural network architectures: a fully-connected network with two hidden layers of 256 Re LU neurons in the MNIST experiments, and a standard Res Net18 [He et al., 2016] in CIFAR and Image Net experiments. ... Batch size is set to 10 across experiments and models. A tiny ring memory of 1 example per class per task is stored for the memory-based methods. ... All experiments run for five different random seeds, each corresponding to a different dataset ordering among tasks, that are fixed across baselines.