Continual Learners are Incremental Model Generalizers

Authors: Jaehong Yoon, Sung Ju Hwang, Yue Cao

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct the experiments on various continual learning frameworks with and without supervision using Image Net1K Split dataset against multiple strong baselines. for unsupervised continual learning that demonstrates the effectiveness of our proposed method on fine-tuning performance on downstream tasks.
Researcher Affiliation Collaboration Jaehong Yoon 1 2 Sung Ju Hwang 1 3 Yue Cao 4 1Korea Advanced Institute of Science and Technology 2Microsoft Research 3Deep Auto 4Beijing Academy of Artificial Intelligence.
Pseudocode Yes Algorithm 1 Continal Pre-training with GLAD
Open Source Code No We follow the setting of Sim MIM (Xie et al., 2022b) and MAE (He et al., 2022) using their official code repositories12 where the masking ratio is 0.6 and 0.75, respectively. We use Vision Transformer (Dosovitskiy et al., 2020) (Vi TB) and Swin Transformer (Liu et al., 2021) (Swin-T) for backbone architectures. ... The implementation is built upon an official code of LUMP3.
Open Datasets Yes Datasets We use Image Net-1K (Deng et al., 2009) and CIFAR-100 dataset (Krizhevsky et al., 2009) by splitting it into ten tasks where each task contains 100 and 10 classes, respectively.
Dataset Splits No We use only 10% of training instances in each task for pre-training, and use the full set for the fine-tuning and linear probe. Accuracy is measured by the validation dataset for each task.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running its experiments.
Software Dependencies No We use Adam W optimizer (Loshchilov & Hutter, 2017) with cosine learning rate decay and the warmup for all experiments. For the pretraining phase at each task, we train the model 60 epochs on supervised learning and 100 epochs on unsupervised learning models as self-supervised learning methods without label supervision may require more iterations to converge.
Experiment Setup Yes Training setups and hyperparameters We use Adam W optimizer (Loshchilov & Hutter, 2017) with cosine learning rate decay and the warmup for all experiments. For the pretraining phase at each task, we train the model 60 epochs on supervised learning and 100 epochs on unsupervised learning models as self-supervised learning methods without label supervision may require more iterations to converge. For fine-tuning, we basically perform 30 epochs training. ... We set the batch size to 64 for Sim Siam pre-training, otherwise 128. Table 3 summarizes the learning rate and training epochs for experiments, and we linearly scale the learning rate with batch size/512 in practice to reflect the input variance, followed by (Goyal et al., 2017).