Does Continual Learning Equally Forget All Parameters?
Authors: Haiyan Zhao, Tianyi Zhou, Guodong Long, Jing Jiang, Chengqi Zhang
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In experiments on several benchmarks of class- and domain-incremental CL, FPF consistently improves existing CL methods by a large margin, and k-FPF further excels in efficiency without degrading the accuracy. |
| Researcher Affiliation | Academia | 1University of Technology Sydney 2University of Maryland. Correspondence to: Haiyan Zhao <Haiyan.Zhao2@student.uts.edu.au>, Tianyi Zhou <tianyi@umd.edu>, Guodong Long, Jing Jiang, Chengqi Zhang <{guodong.long, jing.jiang, chengqi.zhang}@uts.edu.au>. |
| Pseudocode | Yes | The detailed algorithm of FPF, k-FPF and reservoir sampling are shown in Alg. 1, Alg. 2 and Alg. 3. |
| Open Source Code | No | The paper does not include an explicit statement about open-sourcing the code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | We conduct class-IL experiments on Seq MNIST, Seq-Organ AMNIST, Seq-Path MNIST, Seq-CIFAR10, and Seq-Tiny Image Net. ... Seq-MNIST (Seq-CIFAR-10) are generated by splitting the 10 classes in MNIST (Le Cun et al., 1998) (CIFAR-10 (Krizhevsky et al., 2009)) into five binary classification tasks. Seq-Organ AMNIST and Seq Path Mnist are generated by splitting Organ AMNIST or Path MNIST from Med MNIST (Yang et al., 2021), a medical image classification benchmark. ... For domain-IL experiments, we use PACS dataset (Li et al., 2017), which is widely used for domain generalization. |
| Dataset Splits | Yes | Since the epochs are reduced, we re-tune the learning rate and hyper-parameters for different scenarios by performing a grid search on a validation set of 10% samples drawn from the original training set. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU/CPU models or specific machine configurations. |
| Software Dependencies | No | The paper mentions software components like SGD and network architectures (ResNet-18, VGG-11) but does not provide specific version numbers for these or other relevant software libraries and environments. |
| Experiment Setup | Yes | Except for Seq-MNIST, where the number of training epochs for each task is 1, the number of training epochs for each task in all other datasets is set as 5, which is enough for CL. ... For both FPF and k-FPF, we use the same optimizer, i.e., SGD with the cosine-annealing learning rate schedule, and finetune the selected parameters with a batchsize of 32 for all scenarios. The finetuning steps for FPF and k-FPF are 300 and 100, respectively. ... Please refer to Appendix. N for the hyper-parameters we explored. |