Compacting, Picking and Growing for Unforgetting Continual Learning

Authors: Ching-Yi Hung, Cheng-Hao Tu, Cheng-En Wu, Chien-Hung Chen, Yi-Ming Chan, Chu-Song Chen

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that our approach can incrementally learn a deep model tackling multiple tasks without forgetting, while the model compactness is maintained with the performance more satisfiable than individual task training.
Researcher Affiliation Academia Steven C. Y. Hung, Cheng-Hao Tu, Cheng-En Wu, Chien-Hung Chen, Yi-Ming Chan, and Chu-Song Chen Institute of Information Science, Academia Sinica, Taipei, Taiwan MOST Joint Research Center for AI Technology and All Vista Healthcare {brent12052003, andytu455176}@gmail.com, {chengen, redsword26, yiming, song}@iis.sinica.edu.tw
Pseudocode Yes Algorithm 1: Compacting, Picking and Growing Continual Learning Input: given task 1 and an original model trained on task 1. Set an accuracy goal for task 1; Alternatively remove small weights and re-train the remaining weights for task 1 via gradual pruning [51], whenever the accuracy goal is still hold; Let the model weights preserved for task 1 be WP 1 (referred to as task-1 weights), and those that are removed by the iterative pruning be WE 1 (referred to as the released weights); for task k = 2 K (let the released weights of task k be W E k ) do Set an accuracy goal for task k; Apply a mask M to the weights WP 1:k 1; train both M and WE k 1 for task k, with WP 1:k 1 fixed; If the accuracy goal is not achieved, expand the number of filters (weights) in the model, reset WE k 1 and go to previous step; Gradually prune WE k 1 to obtain WE k (with WP 1:k 1 fixed) for task k, until meeting the accuracy goal; WP k = WE k 1\WE k and WP 1:k = WP 1:k 1 WP k ; end
Open Source Code Yes Our codes are available at https://github.com/ivclab/CPG.
Open Datasets Yes We divide the CIFAR-100 dataset into 20 tasks. Each task has 5 classes, 2500 training images, and 500 testing images. In the experiment, VGG16-BN model (VGG16 with batch normalization layers) is employed to train the 20 tasks sequentially.
Dataset Splits No Section 4.1 mentions '2500 training images, and 500 testing images' per task for CIFAR-100, and Table 4 shows '#Train' and '#Eval' counts for other datasets. However, the paper does not explicitly specify how validation sets are created or used distinct from training/testing splits, nor does it specify exact percentages or explicit validation split methodologies.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running experiments.
Software Dependencies No We implement our CPG approach1 and independent task learning (from scratch or fine-tuning) via Py Torch [30] in all experiments, but implement DEN [27] via Tensorflow [1] with its official codes. However, specific version numbers for PyTorch or TensorFlow are not provided.
Experiment Setup No The paper specifies the models used (VGG16-BN, ResNet50, 20-layer CNN from SphereFace) and mentions procedural settings like gradual pruning with an accuracy goal, but it does not provide specific numerical hyperparameters such as learning rates, batch sizes, optimizers, or epochs.