Beyond Not-Forgetting: Continual Learning with Backward Knowledge Transfer
Authors: Sen Lin, Li Yang, Deliang Fan, Junshan Zhang
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental studies show that CUBER can even achieve positive backward knowledge transfer on several existing CL benchmarks for the first time without data replay, where the related baselines still suffer from catastrophic forgetting (negative backward knowledge transfer). The superior performance of CUBER on the backward knowledge transfer also leads to higher accuracy accordingly. |
| Researcher Affiliation | Academia | Sen Lin School of ECEE Arizona State University slin70@asu.edu Li Yang School of ECEE Arizona State University lyang166@asu.edu Deliang Fan School of ECEE Arizona State University dfan@asu.edu Junshan Zhang Department of ECE University of California, Davis jazh@ucdavis.edu |
| Pseudocode | Yes | Algorithm 1 Continual learning with backward knowledge transfer (CUBER) |
| Open Source Code | Yes | We include the code in the supplemental material. |
| Open Datasets | Yes | Datasets. We evaluate the performance of CUBER on four standard CL benchmarks. (1) Permuted MNIST: a variant of the MNIST dataset [14] where random permutations are applied to the input pixels. ... (2) Split CIFAR-100: we divide the CIFAR-100 dataset [13] ... (3) 5-Datasets: we consider a sequence of 5 datasets, i.e., CIFAR-10, MNIST, SVHN [21], not-MNIST[2], Fashion MNIST[28], and the classification problem on each dataset is a task. (4) Split Mini Image Net: we divide the Mini Image Net dataset [27]... |
| Dataset Splits | No | While the paper mentions "early termination based on the validation loss" indicating the use of a validation set, it does not provide specific details on the dataset splits (e.g., percentages or sample counts) for training, validation, and testing. |
| Hardware Specification | No | The main text of the paper, including the 'Network and training details' section, does not specify the hardware used for the experiments (e.g., specific GPU or CPU models). While the checklist states 'See the appendix' for this information, the appendix content is not provided. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., names of libraries or frameworks with their versions) used for the experiments. |
| Experiment Setup | Yes | For Permuted MNIST, we consider a 3-layer fully-connected network including 2 hidden layers with 100 units. And we train the network for 5 epochs with a batch size of 10 for every task. For Split CIFAR-100, we use a version of 5-layer Alex Net by following [25, 17]. When learning each task, we train the network for a maximum of 200 epochs with early termination based on the validation loss, and use a batch size of 64. ... In the experiments, we set ϵ1 = 0.5. |