Forget-free Continual Learning with Winning Subnetworks

Authors: Haeyong Kang, Rusty John Lloyd Mina, Sultan Rizky Hikmawan Madjid, Jaehong Yoon, Mark Hasegawa-Johnson, Sung Ju Hwang, Chang D. Yoo

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We now validate our method on several benchmark datasets against relevant continual learning baselines.
Researcher Affiliation Collaboration 1Korea Advanced Institute of Science and Technology (KAIST), South Korea 2University of Illinois at Urbana-Champaign, USA 3AITRICS, South Korea.
Pseudocode Yes Algorithm 1 Winning Sub Networks (WSN)
Open Source Code Yes Code is available at https://github.com/ihaeyong/WSN.
Open Datasets Yes We use six different popular sequential datasets for CL problems with five different neural network architectures as follows: 1) Permuted MNIST (PMNIST): A variant of MNIST (Le Cun, 1998)... 2) 5-Datasets... CIFAR-10 (Krizhevsky et al., 2009), MNIST (Le Cun, 1998), SVHN (Netzer et al., 2011), Fashion MNIST (Xiao et al., 2017), and not MNIST (Bulatov, 2011). 3) Omniglot Rotation... 4) CIFAR-100 Split (Krizhevsky et al., 2009)... 5) CIFAR-100 Superclass... 6) Tiny Image Net (Stanford, 2021)...
Dataset Splits Yes In conducting the PMNIST dataset, we keep 10% of the training data from each task for validation. On the other datasets, however, we keep only 5% of training data from each task for validation.
Hardware Specification Yes All our experiments run on a single-GPU setup of NVIDIA V100.
Software Dependencies No The paper mentions implementing from another official code and using various network architectures, but it does not specify software dependencies with version numbers (e.g., 'PyTorch 1.x' or 'CUDA 11.x').
Experiment Setup Yes For each task in PMNIST, we train the network for 5 epochs with a batch size of 10. In 5-Dataset and Omniglot Rotation experiments, we train each task for a maximum of 100 epochs with the early termination strategy... For experiments in both datasets, we fix the batch size to 64. (from Appendix A.3)