Residual Continual Learning
Authors: Janghyeon Lee, Donggyu Joo, Hyeong Gwon Hong, Junmo Kim4553-4560
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method for sequential learning of image classification tasks and compare it with other methods, including fine-tuning, Lw F, and Mean-IMM, that do not re-fer to any source task information for fair comparisons. Mode-IMM is not compared in the experiment because it requires the Fisher information matrix, which cannot be obtained without source data. The source and target tasks are to classify the CIFAR-10, CIFAR-100 (Krizhevsky 2009), or SVHN (Netzer et al. 2011) dataset. A pre-activation residual network of 32 layers without bottlenecks (He et al. 2016b) is used. |
| Researcher Affiliation | Academia | Janghyeon Lee,1 Donggyu Joo,1 Hyeong Gwon Hong,2 Junmo Kim1 1School of Electrical Engineering, KAIST 2Graduate School of AI, KAIST {wkdgus9305, jdg105, honggudrnjs, junmo.kim}@kaist.ac.kr |
| Pseudocode | Yes | Algorithm 1: Residual Continual Learning |
| Open Source Code | No | The paper does not contain any explicit statement about releasing open-source code or a link to a code repository. |
| Open Datasets | Yes | The source and target tasks are to classify the CIFAR-10, CIFAR-100 (Krizhevsky 2009), or SVHN (Netzer et al. 2011) dataset. |
| Dataset Splits | No | The paper mentions 'target validation data' but does not provide specific details on how the dataset was split into training, validation, and test sets (e.g., percentages or sample counts). |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models used for running its experiments. |
| Software Dependencies | No | The paper does not specify software dependencies with version numbers (e.g., specific deep learning frameworks or programming language versions). |
| Experiment Setup | Yes | For the CIFAR datasets, data augmentation and hyperparameter settings are the same as those in (He et al. 2016b). Training images are horizontally flipped with a probability of 0.5 and randomly cropped to 32 32 from 40 40 zero-padded images during training. SGD with a momentum of 0.9, a minibatch size of 128, and a weight decay of λdec = 0.0001 optimizes networks until 64000 iterations. [...] The learning rate starts from 0.1 and is multiplied by 0.1 at 32000 and 48000 iterations. The He s initialization method (He et al. 2015) is used to initialize source networks. Combination parameters (αs, αt) in Res CL are initialized to ( 1/2 1, 1/2 1) in order to balance the original and new features at the early stage of training. |