Understanding the Role of Training Regimes in Continual Learning

Authors: Seyed Iman Mirzadeh, Mehrdad Farajtabar, Razvan Pascanu, Hassan Ghasemzadeh

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our study provides practical insights to improve stability via simple yet effective techniques that outperform alternative baselines. Crucially, we empirically show that jointly with a carefully tuned learning rate schedule and batch size, these simple techniques can outperform considerably more complex algorithms meant to deal with continual learning (Section 5). In this section, after explaining our experimental setup, we show the relationship between the curvature of the loss function and the amount of forgetting.
Researcher Affiliation Collaboration Seyed Iman Mirzadeh Washington State University, USA seyediman.mirzadeh@wsu.edu Mehrdad Farajtabar Deep Mind, USA farajtabar@google.com Razvan Pascanu Deep Mind, UK razp@google.com Hassan Ghasemzadeh Washington State University, USA hassan.ghasemzadeh@wsu.edu
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes 1The code is available at: https://github.com/imirzadeh/stable-continual-learning
Open Datasets Yes Datasets. We perform our experiments on three standard continual learning benchmarks: Permuted MNIST [22], Rotated MNIST, and Split CIFAR-100.
Dataset Splits Yes We use two metrics from [6, 7, 9] to evaluate continual learning algorithms when the number of tasks is large. (1) Average Accuracy: The average validation accuracy after the model has been trained sequentially up to task t, defined by: i=1 at,i (7) where, at,i is the validation accuracy on dataset i when the model finished learning task t.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper mentions using PyTorch by citing it [59] and refers to "pytorch-hessian-eigenthings" [21] but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes Fig. 1 shows how significantly these techniques can overcome catastrophic forgetting. dropout=0.5 bs=16 lr=0.25 lr decay=0.4 hiddens=100 dropout=0.0 bs=256 lr=0.05 lr decay=1 hiddens=100. For brevity, we include the detailed hyper-parameters, the code, and instructions for reproducing the results in the supplementary file.