Understanding the Role of Training Regimes in Continual Learning
Authors: Seyed Iman Mirzadeh, Mehrdad Farajtabar, Razvan Pascanu, Hassan Ghasemzadeh
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our study provides practical insights to improve stability via simple yet effective techniques that outperform alternative baselines. Crucially, we empirically show that jointly with a carefully tuned learning rate schedule and batch size, these simple techniques can outperform considerably more complex algorithms meant to deal with continual learning (Section 5). In this section, after explaining our experimental setup, we show the relationship between the curvature of the loss function and the amount of forgetting. |
| Researcher Affiliation | Collaboration | Seyed Iman Mirzadeh Washington State University, USA seyediman.mirzadeh@wsu.edu Mehrdad Farajtabar Deep Mind, USA farajtabar@google.com Razvan Pascanu Deep Mind, UK razp@google.com Hassan Ghasemzadeh Washington State University, USA hassan.ghasemzadeh@wsu.edu |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1The code is available at: https://github.com/imirzadeh/stable-continual-learning |
| Open Datasets | Yes | Datasets. We perform our experiments on three standard continual learning benchmarks: Permuted MNIST [22], Rotated MNIST, and Split CIFAR-100. |
| Dataset Splits | Yes | We use two metrics from [6, 7, 9] to evaluate continual learning algorithms when the number of tasks is large. (1) Average Accuracy: The average validation accuracy after the model has been trained sequentially up to task t, defined by: i=1 at,i (7) where, at,i is the validation accuracy on dataset i when the model finished learning task t. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions using PyTorch by citing it [59] and refers to "pytorch-hessian-eigenthings" [21] but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | Fig. 1 shows how significantly these techniques can overcome catastrophic forgetting. dropout=0.5 bs=16 lr=0.25 lr decay=0.4 hiddens=100 dropout=0.0 bs=256 lr=0.05 lr decay=1 hiddens=100. For brevity, we include the detailed hyper-parameters, the code, and instructions for reproducing the results in the supplementary file. |