Continual Learning with Node-Importance based Adaptive Group Sparse Regularization

Authors: Sangwon Jung, Hongjoon Ahn, Sungmin Cha, Taesup Moon

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Throughout the extensive experimental results, we show that our AGS-CL uses orders of magnitude less memory space for storing the regularization parameters, and it significantly outperforms several state-of-the-art baselines on representative benchmarks for both supervised and reinforcement learning.
Researcher Affiliation Academia Sangwon Jung1 , Hongjoon Ahn2 , Sungmin Cha1 and Taesup Moon1,2 1Department of Electrical and Computer Engineering, 2 Department of Artificial Intelligence, Sungkyunkwan University, Suwon, Korea 16419 {s.jung, hong0805, csm9493, tsmoon}@skku.edu
Pseudocode Yes Finally, we summarize our method in Algorithm 1 and Figure 2.
Open Source Code No The paper states 'Our method was implemented with Py Torch [25]' but does not provide any specific link or explicit statement about making the source code available.
Open Datasets Yes We tested on multiple different vision datasets and thoroughly showed the effectiveness of our method: CIFAR-10/100 [14] was used as a standard benchmark with smaller number of tasks, Omniglot [16] was used to compare the performance for large number of tasks, CUB200 [36] was used to test on more complex, large-scale data, and the sequence of 8 different datasets, {CIFAR-10 / CIFAR-100 / MNIST / SVHN / Fashion-MNIST / Traffic-Signs / Face Scrub / Not MNIST}, which was proposed in [33], was used to test the check the learning capability for different visual domains. We now evaluate the performance of AGS-CL on Atari [6] reinforcement learning (RL) tasks.
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) for training, validation, and test sets. It mentions '5 different random seed runs' but not the specific proportions for data partitioning.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running its experiments, only stating the use of CNNs.
Software Dependencies No The paper mentions 'Our method was implemented with Py Torch [25]' and uses 'Adam [12] step', but does not provide specific version numbers for these software components.
Experiment Setup Yes We used multi-headed outputs for all experiments, and 5 different random seed runs (that also shuffle task sequences except for Omniglot) are averaged for all datasets. ... For all the experiments, we used convolutional neural networks (CNN) with Re LU activations, of which architectures are the followings: for CIFAR-10/100, we used 6 convolution layers followed by 2 fully connected layers, for Omniglot, we used 4 convolution layers as in [31], for CUB200, we used Alex Net [15] pre-trained on Image Net [9], and for the mixture of different tasks, we used Alex Net trained from scratch. ... Adam [12] step was used as LTS,t(θk) in (4), and PGD update (5) was applied once after each epoch. ... µ, λ 0 are the hyperparameters that set the trade-offs among the penalty terms. ... Moreover, we always used η = 0.9. ... ρ (0, 1] is a hyperparameter that controls the capacity of the network for learning new tasks, and ρ 0.5 typically shows good trade-off... Each task is learned with 10^7 steps, and we evaluated the reward of the agent 30 times per task.