Contextual Transformation Networks for Online Continual Learning

Authors: Quang Pham, Chenghao Liu, Doyen Sahoo, Steven HOI

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive experiments show that CTN is competitive with a large scale dynamic architecture network and consistently outperforms other fixed architecture methods under the same standard backbone.
Researcher Affiliation Collaboration 1 Singapore Management University hqpham.2017@smu.edu.sg 2 Salesforce Research Asia {chenghao.liu, dsahoo, shoi}@salesforce.com
Pseudocode Yes We provide the details algorithm of our CTN and its subroutines in Alg. 1.
Open Source Code Yes Our implementation can be found at https://github. com/phquang/Contextual-Transformation-Network.
Open Datasets Yes We consider four continual learning benchmarks in our experiments. Permuted MNIST (p MNIST) (Lopez-Paz & Ranzato, 2017): each task is a random but fixed permutation of the original MNIST. We generate 23 tasks with 1,000 images for training and the testing set has the same amount of images as in the original MNIST data. Split CIFAR-100 (Split CIFAR) (Lopez-Paz & Ranzato, 2017) is constructed by splitting the CIFAR100 (Krizhevsky & Hinton, 2009) dataset into 20 tasks, each of which contains 5 different classes sampled without replacement from the total of 100 classes. Split Mini Image Net (Split mini IMN) (Chaudhry et al., 2019a), similarly, we split the mini IMN dataset (Vinyals et al., 2016) into 20 disjoint tasks. Finally, we consider the CORe50 benchmark by constructing a sequence of 10 tasks using the original CORe50 dataset (Lomonaco & Maltoni, 2017).
Dataset Splits Yes For CTN, the episodic memory and semantic memory are implemented as two Ring buffers with sizes equal to 80% and 20% of the total budget. This configuration is also cross-validated from the validation tasks. [...] We follow the procedure proposed in Chaudhry et al. (2019a) to cross-validate all hyperparameters using the first three tasks.
Hardware Specification Yes Experiments are conducted using a single K80 GPU and all methods are allowed up to four gradients steps per sample.
Software Dependencies Yes All methods are implemented using Pytorch (Paszke et al., 2019) version 1.5 and CUDA 10.2.
Experiment Setup Yes We use a multilayer perceptron with two hidden layers of size 256 for p MNIST, a reduced Res Net18 with three times fewer filters (Lopez-Paz & Ranzato, 2017) for Split CIFAR and Split mini IMN, and a full Res Net18 on CORE50. [...] We optimize all models using SGD with a mini-batch of size ten over one epoch. [...] Appendix C.5 HYPERPARAMETER SELECTION provides detailed hyperparameter values for each method including learning rates, regularization strength, temperature, and number of updates.