Overcoming Catastrophic Forgetting by Neuron-Level Plasticity Control

Authors: Inyoung Paik, Sangjun Oh, Taeyeong Kwak, Injung Kim5339-5346

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experimental results on the several datasets show that neuron-level consolidation is substantially more effective compared to connection-level consolidation approaches.
Researcher Affiliation Collaboration Inyoung Paik, Sangjun Oh, Taeyeong Kwak Deep Bio Inc., Seoul, Republic of Korea {iypaik, tykwak}@deepbio.com, me@juneoh.net Injung Kim Handong Global University, Pohang, Republic of Korea ijkim@handong.edu
Pseudocode Yes Algorithm 1 Neuron-level Plasticity Control (NPC)
Open Source Code No The paper does not provide any concrete statement or link regarding the public availability of its source code.
Open Datasets Yes We experimented on an incremental version of MNIST(Le Cun et al. 1998) and CIFAR100(Krizhevsky and Hinton 2009) datasets, where the datasets containing X classes were divided into K subsets of X/K classes, each of which is classified by the k-th task. We set K to 5 for MNIST and 10 for CIFAR100. For preprocessing, we applied random cropping with padding size of 4 for both datasets. We also applied random horizontal flip for the incremental CIFAR100 (i CIFAR100) dataset. Additionally, we experimented on sequential tasks with heterogeneous datasets, which is composed of MNIST(Le Cun et al. 1998), fashion-MNIST(f MNIST)(Xiao, Rasul, and Vollgraf 2017), EMNIST(balanced dataset)(Cohen et al. 2017), and small NORB (Le Cun 2004).
Dataset Splits No The paper mentions using 'validation accuracy' for tuning hyperparameters and in figures, but it does not provide specific details on the dataset splits (e.g., percentages or sample counts) used for training, validation, and testing within each task.
Hardware Specification Yes All experiments were performed on a server with 2 NVIDIA Tesla P40 GPUs.
Software Dependencies No The paper mentions various algorithms and components like CNN, Instance Normalization, and SGD, but it does not specify any software dependencies with their version numbers (e.g., PyTorch version, Python version, or specific library versions).
Experiment Setup Yes We used a simple CNN with 3 convolutional layers with (128, 512, 256) channels, and 2 fully connected layers with (512, number of classes) nodes. Each convolutional layer consists of convolution, Instance normalization, ReLU activation, and (2,2) max pooling. Dropout(Srivastava et al. 2014) of rate 0.2 is applied between two fully connected layers. The cross-entropy loss for each task was computed from only the output nodes belonging to the current task. For consistency, we redefined the unit of one epoch in all experiments as the cycle in which the total number of train data was seen. ... we trained the models for 30 epochs on each task. As a result, we used αNPC = 0.05, βNPC = 0.5, δNPC = 1e-4, λEWC = 900, λMAS = 3.0, λSI = 0.08, λSSL = 2e-6. In a baseline experiment, we used L2 regularization with λ = 1e-4. We heuristically set ηmax = 0.1 for all experiments.