reproducibilityindex.ai

Parameter-Level Soft-Masking for Continual Learning

Authors: Tatsuya Konishi, Mori Kurokawa, Chihiro Ono, Zixuan Ke, Gyuhak Kim, Bing Liu

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate the effectiveness of SPG in achieving all three objectives.
Researcher Affiliation	Collaboration	1KDDI Research, Inc., Fujimino, Japan. 2University of Illinois at Chicago, Chicago, United States.
Pseudocode	Yes	Algorithm 1 Continual Learning in SPG.
Open Source Code	Yes	The code is available at https://github.com/UIC-Liu-Lab/spg.
Open Datasets	Yes	Datasets: The proposed SPG is evaluated using 5 CL datasets. Their statistics are given in Table 1. ... (1) CIFAR100-n (C-n): CIFAR100 (Krizhevsky & Hinton, 2009)... (2) Tiny Image Net-n (T-n): Tiny Image Net (Wu et al., 2017)... (3) Image Net-100 (I-100): Image Net (Russakovsky et al., 2015)... (4) F-Celeb A-n (FC-n): Federated Celeb A (Liu et al., 2015)... (5) F-EMNIST-n (FE-n): Federated EMNIST (Liu et al., 2015)...
Dataset Splits	Yes	Table 1. Statistics of the CL datasets. n can take 10 and 20. Validation sets are used for early stopping. ... Dataset C-n: #Train 45,000, #Validation 5,000, #Test 10,000 ...
Hardware Specification	No	The paper does not provide specific details about the hardware used, such as GPU or CPU models. It only mentions using Alex Net as a backbone.
Software Dependencies	No	The paper mentions optimizers (SGD, RMSProp) and loss functions (cross-entropy), but does not provide specific version numbers for any software libraries or frameworks used.
Experiment Setup	Yes	The networks are trained with SGD by minimizing the cross-entropy loss except for TAG, which uses the RMSProp optimizer as SGD-based TAG has not been provided by the authors. The mini-batch size is 64 except MTL that uses 640 for its stability to learn more tasks and classes together. Hyper-parameters, such as the dropout rate or each method s specific hyper-paramters, are searched based on Tree-structured Parzen Estimator (Bergstra et al., 2011). With the found best hyper-parameters, the experiments are conducted 5 times with different seeds, and the average accuracy and standard deviation are reported.