Recasting Continual Learning as Sequence Modeling

Authors: Soochan Lee, Jaehyeon Son, Gunhee Kim

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments on seven benchmarks, covering both classification and regression, show that sequence models can be an attractive solution for general MCL.
Researcher Affiliation Academia Soochan Lee Seoul National University soochan.lee@vision.snu.ac.kr Jaehyeon Son Seoul National University sjh9876@snu.ac.kr Gunhee Kim Seoul National University gunhee@snu.ac.kr
Pseudocode Yes Algorithm 1 Inner loop of conventional SGD-based MCL
Open Source Code Yes Code is available at https://github.com/soochan-lee/cl-as-seq
Open Datasets Yes CIFAR-100 [18]. Omniglot [19]. CASIA Chinese Handwriting Database (CASIA; 22). MS-Celeb-1M [10].
Dataset Splits No The paper states: 'The tasks are then split into two disjoint sets, one for meta-training and the other for meta-testing.' It does not explicitly mention a separate validation set or split for hyperparameter tuning, distinct from the meta-training and meta-test sets.
Hardware Specification Yes We compare various aspects of the computational cost using our PyTorch [27] implementation on NVIDIA A40 GPUs which have 48 GB of VRAM.
Software Dependencies No The paper mentions 'PyTorch [27] implementation' but does not specify a version number for PyTorch or any other software dependencies.
Experiment Setup Yes By default, we set K = 20, while additionally testing the K = 100 setting to compare performances with longer episodes. For each task k, the training stream Dtrain k and the test set Dtest k contain five examples each (i.e., five shots). For each experiment, we meta-train for 50K steps with a batch size of 16 (i.e., 16 episodes in parallel) and meta-test with 1,024 episodes. All the models share a similar architecture: 4 layers, 8 heads, and 512 hidden dimensions.