A comprehensive, application-oriented study of catastrophic forgetting in DNNs

Authors: B. Pfülb, A. Gepperth

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present a large-scale empirical study of catastrophic forgetting (CF) in modern Deep Neural Network (DNN) models that perform sequential (or: incremental) learning. A new experimental protocol is proposed that enforces typical constraints encountered in application scenarios. As the investigation is empirical, we evaluate CF behavior on the hitherto largest number of visual classification datasets...
Researcher Affiliation Academia B. Pf ulb & A. Gepperth Department of Computer Science Hochschule Fulda Fulda 36037, Germany {benedikt.pfuelb,alexander.gepperth}@cs.hs-fulda.de
Pseudocode Yes Algorithm 1: The application-oriented evaluation strategy used in this study.
Open Source Code Yes The source code for all processed models, the experiment-generator and evaluation routine can be found on our public available repository 2. 2https://gitlab.informatik.hs-fulda.de/ML-Projects/CF_in_DNNs
Open Datasets Yes We select the following datasets (see Tab. 1). In order to construct SLTs uniformly across datasets, we choose the 10 best-represented classes (or random classes if balanced) if more are present. MNIST (Le Cun et al., 1998)... EMNIST (Cohen et al., 2017)... Fruits 360 (Murean & Oltean, 2017)... Devanagari (Acharya et al., 2015)... Fashion MNIST (Xiao et al., 2017)... SVHN (Netzer et al., 2011)... CIFAR10 (Krizhevsky, 2009)... Not MNIST (Bulatov Yaroslav)... MADBase (Abdelazeem Sherif & El-Sherif Ezzat)...
Dataset Splits No Table 1 lists 'train' and 'test' data counts for each dataset. While hyper-parameter optimization is performed, the paper does not explicitly define a distinct 'validation' dataset split by percentage or count separate from the training and test sets.
Hardware Specification Yes We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research.
Software Dependencies Yes For all tested DNN models (see below), we use a Tensor Flow (v1.7) implementation under Python (v3.4 and later).
Experiment Setup Yes We vary the number of hidden layers L {2, 3} and their size S {200, 400, 800} (CNNs excluded), the learning rate ϵ1 {0.01, 0.001} for sub-task D1, and the re-training learning rate ϵ2 {0.001, 0.0001, 0.00001} for sub-task D2. The batch size (batchsize) is fixed to 100 for all experiments, and is used for both training and testing. As in other studies, we do not use a fixed number of training iterations, but specify the number of training epochs (i.e., passes through the whole dataset) as E = 10 for each processed dataset... For all models that use dropout, the dropout rate for the input layer is fixed to 0.2, and to 0.5 for all hidden layers... The LWTA block size is fixed to 2... The model parameter λ for EWC is set to λ1/ϵ2... For all models except IMM, the momentum parameter for the optimizer is set to µ = 0.99... For the IMM models... the regularizer value for the L2-regularization is set to 0.01 for L2-transfer and to 0.0 for weight transfer.