reproducibilityindex.ai

Understanding the Role of Training Regimes in Continual Learning

Authors: Seyed Iman Mirzadeh, Mehrdad Farajtabar, Razvan Pascanu, Hassan Ghasemzadeh

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our study provides practical insights to improve stability via simple yet effective techniques that outperform alternative baselines. Crucially, we empirically show that jointly with a carefully tuned learning rate schedule and batch size, these simple techniques can outperform considerably more complex algorithms meant to deal with continual learning (Section 5). In this section, after explaining our experimental setup, we show the relationship between the curvature of the loss function and the amount of forgetting.
Researcher Affiliation	Collaboration	Seyed Iman Mirzadeh Washington State University, USA seyediman.mirzadeh@wsu.edu Mehrdad Farajtabar Deep Mind, USA farajtabar@google.com Razvan Pascanu Deep Mind, UK razp@google.com Hassan Ghasemzadeh Washington State University, USA hassan.ghasemzadeh@wsu.edu
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	1The code is available at: https://github.com/imirzadeh/stable-continual-learning
Open Datasets	Yes	Datasets. We perform our experiments on three standard continual learning benchmarks: Permuted MNIST [22], Rotated MNIST, and Split CIFAR-100.
Dataset Splits	Yes	We use two metrics from [6, 7, 9] to evaluate continual learning algorithms when the number of tasks is large. (1) Average Accuracy: The average validation accuracy after the model has been trained sequentially up to task t, deﬁned by: i=1 at,i (7) where, at,i is the validation accuracy on dataset i when the model ﬁnished learning task t.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions using PyTorch by citing it [59] and refers to "pytorch-hessian-eigenthings" [21] but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	Fig. 1 shows how signiﬁcantly these techniques can overcome catastrophic forgetting. dropout=0.5 bs=16 lr=0.25 lr decay=0.4 hiddens=100 dropout=0.0 bs=256 lr=0.05 lr decay=1 hiddens=100. For brevity, we include the detailed hyper-parameters, the code, and instructions for reproducing the results in the supplementary ﬁle.