Parameter-Level Soft-Masking for Continual Learning
Authors: Tatsuya Konishi, Mori Kurokawa, Chihiro Ono, Zixuan Ke, Gyuhak Kim, Bing Liu
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the effectiveness of SPG in achieving all three objectives. |
| Researcher Affiliation | Collaboration | 1KDDI Research, Inc., Fujimino, Japan. 2University of Illinois at Chicago, Chicago, United States. |
| Pseudocode | Yes | Algorithm 1 Continual Learning in SPG. |
| Open Source Code | Yes | The code is available at https://github.com/UIC-Liu-Lab/spg. |
| Open Datasets | Yes | Datasets: The proposed SPG is evaluated using 5 CL datasets. Their statistics are given in Table 1. ... (1) CIFAR100-n (C-n): CIFAR100 (Krizhevsky & Hinton, 2009)... (2) Tiny Image Net-n (T-n): Tiny Image Net (Wu et al., 2017)... (3) Image Net-100 (I-100): Image Net (Russakovsky et al., 2015)... (4) F-Celeb A-n (FC-n): Federated Celeb A (Liu et al., 2015)... (5) F-EMNIST-n (FE-n): Federated EMNIST (Liu et al., 2015)... |
| Dataset Splits | Yes | Table 1. Statistics of the CL datasets. n can take 10 and 20. Validation sets are used for early stopping. ... Dataset C-n: #Train 45,000, #Validation 5,000, #Test 10,000 ... |
| Hardware Specification | No | The paper does not provide specific details about the hardware used, such as GPU or CPU models. It only mentions using Alex Net as a backbone. |
| Software Dependencies | No | The paper mentions optimizers (SGD, RMSProp) and loss functions (cross-entropy), but does not provide specific version numbers for any software libraries or frameworks used. |
| Experiment Setup | Yes | The networks are trained with SGD by minimizing the cross-entropy loss except for TAG, which uses the RMSProp optimizer as SGD-based TAG has not been provided by the authors. The mini-batch size is 64 except MTL that uses 640 for its stability to learn more tasks and classes together. Hyper-parameters, such as the dropout rate or each method s specific hyper-paramters, are searched based on Tree-structured Parzen Estimator (Bergstra et al., 2011). With the found best hyper-parameters, the experiments are conducted 5 times with different seeds, and the average accuracy and standard deviation are reported. |