Parameter-Level Soft-Masking for Continual Learning

Authors: Tatsuya Konishi, Mori Kurokawa, Chihiro Ono, Zixuan Ke, Gyuhak Kim, Bing Liu

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate the effectiveness of SPG in achieving all three objectives.
Researcher Affiliation Collaboration 1KDDI Research, Inc., Fujimino, Japan. 2University of Illinois at Chicago, Chicago, United States.
Pseudocode Yes Algorithm 1 Continual Learning in SPG.
Open Source Code Yes The code is available at https://github.com/UIC-Liu-Lab/spg.
Open Datasets Yes Datasets: The proposed SPG is evaluated using 5 CL datasets. Their statistics are given in Table 1. ... (1) CIFAR100-n (C-n): CIFAR100 (Krizhevsky & Hinton, 2009)... (2) Tiny Image Net-n (T-n): Tiny Image Net (Wu et al., 2017)... (3) Image Net-100 (I-100): Image Net (Russakovsky et al., 2015)... (4) F-Celeb A-n (FC-n): Federated Celeb A (Liu et al., 2015)... (5) F-EMNIST-n (FE-n): Federated EMNIST (Liu et al., 2015)...
Dataset Splits Yes Table 1. Statistics of the CL datasets. n can take 10 and 20. Validation sets are used for early stopping. ... Dataset C-n: #Train 45,000, #Validation 5,000, #Test 10,000 ...
Hardware Specification No The paper does not provide specific details about the hardware used, such as GPU or CPU models. It only mentions using Alex Net as a backbone.
Software Dependencies No The paper mentions optimizers (SGD, RMSProp) and loss functions (cross-entropy), but does not provide specific version numbers for any software libraries or frameworks used.
Experiment Setup Yes The networks are trained with SGD by minimizing the cross-entropy loss except for TAG, which uses the RMSProp optimizer as SGD-based TAG has not been provided by the authors. The mini-batch size is 64 except MTL that uses 640 for its stability to learn more tasks and classes together. Hyper-parameters, such as the dropout rate or each method s specific hyper-paramters, are searched based on Tree-structured Parzen Estimator (Bergstra et al., 2011). With the found best hyper-parameters, the experiments are conducted 5 times with different seeds, and the average accuracy and standard deviation are reported.