Scalable and Order-robust Continual Learning with Additive Parameter Decomposition
Authors: Jaehong Yoon, Saehoon Kim, Eunho Yang, Sung Ju Hwang
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our network with APD, APD-Net, on multiple benchmark datasets against state-of-the-art continual learning methods, which it largely outperforms in accuracy, scalability, and order-robustness. |
| Researcher Affiliation | Collaboration | Jaehong Yoon1, Saehoon Kim2, Eunho Yang1,2, and Sung Ju Hwang1,2 KAIST1, AITRICS2, South Korea {jaehong.yoon, eunhoy, sjhwang82}@kaist.ac.kr, shkim@aitrics.com |
| Pseudocode | Yes | Algorithm 1 Continual learning with Additive Parameter Decomposition |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code or a link to a code repository. |
| Open Datasets | Yes | 1) CIFAR-100 Split (Krizhevsky & Hinton, 2009) consists of images from 100 generic object classes. 2) CIFAR-100 Superclass consists of images from 20 superclasses of the CIFAR-100 dataset. 3) Omniglot-rotation (Lake et al., 2015) contains OCR images of 1,200 characters (we only use the training set) from various writing systems for training |
| Dataset Splits | Yes | 1) CIFAR-100 Split (Krizhevsky & Hinton, 2009) consists of images from 100 generic object classes. We split the classes into 10 group, and consider 10-way multi-class classification in each group as a single task. We use 5 random training/validation/test splits of 4,000/1,000/1,000 samples. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using exponential learning rate decay and weight decay, but does not specify any software libraries or their version numbers (e.g., PyTorch, TensorFlow, scikit-learn versions). |
| Experiment Setup | Yes | We used exponential learning rate decay at each epoch and all models are applied on weight decay with λ = 1e 4. All hyperparameters are deterimined from a validation set. For MNIST-Variation, we used two-layered feedforward networks with 312, 128 neurons. Training epochs are 50 for all baselines and APDs. λ1 = [2e 4, 1e 4] on APD. For CIFAR-100 Split and CIFAR-100 Superclass, we used Le Net with 20-50-800-500 neurons. Training epochs are 20 for all models. λ1 = [6e 4, 4e 4]. We equally set λ2 = 100, also K=2 per 5 tasks, and β=1e 2 for hierarchical knowledge consolidation on MNIST-Variation, CIFAR-100 Split, and CIFAR-100 Superclass. |