Forget-free Continual Learning with Winning Subnetworks
Authors: Haeyong Kang, Rusty John Lloyd Mina, Sultan Rizky Hikmawan Madjid, Jaehong Yoon, Mark Hasegawa-Johnson, Sung Ju Hwang, Chang D. Yoo
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We now validate our method on several benchmark datasets against relevant continual learning baselines. |
| Researcher Affiliation | Collaboration | 1Korea Advanced Institute of Science and Technology (KAIST), South Korea 2University of Illinois at Urbana-Champaign, USA 3AITRICS, South Korea. |
| Pseudocode | Yes | Algorithm 1 Winning Sub Networks (WSN) |
| Open Source Code | Yes | Code is available at https://github.com/ihaeyong/WSN. |
| Open Datasets | Yes | We use six different popular sequential datasets for CL problems with five different neural network architectures as follows: 1) Permuted MNIST (PMNIST): A variant of MNIST (Le Cun, 1998)... 2) 5-Datasets... CIFAR-10 (Krizhevsky et al., 2009), MNIST (Le Cun, 1998), SVHN (Netzer et al., 2011), Fashion MNIST (Xiao et al., 2017), and not MNIST (Bulatov, 2011). 3) Omniglot Rotation... 4) CIFAR-100 Split (Krizhevsky et al., 2009)... 5) CIFAR-100 Superclass... 6) Tiny Image Net (Stanford, 2021)... |
| Dataset Splits | Yes | In conducting the PMNIST dataset, we keep 10% of the training data from each task for validation. On the other datasets, however, we keep only 5% of training data from each task for validation. |
| Hardware Specification | Yes | All our experiments run on a single-GPU setup of NVIDIA V100. |
| Software Dependencies | No | The paper mentions implementing from another official code and using various network architectures, but it does not specify software dependencies with version numbers (e.g., 'PyTorch 1.x' or 'CUDA 11.x'). |
| Experiment Setup | Yes | For each task in PMNIST, we train the network for 5 epochs with a batch size of 10. In 5-Dataset and Omniglot Rotation experiments, we train each task for a maximum of 100 epochs with the early termination strategy... For experiments in both datasets, we fix the batch size to 64. (from Appendix A.3) |