Revisiting Neural Networks for Continual Learning: An Architectural Perspective

Authors: Aojun Lu, Tao Feng, Hangjie Yuan, Xiaotian Song, Yanan Sun

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental validation across various CL settings and scenarios demonstrates that improved architectures are parameter-efficient, achieving state-of-the-art performance of CL while being 86%, 61%, and 97% more compact in terms of parameters than the naive CL architecture in Task IL and Class IL.
Researcher Affiliation Academia Aojun Lu1 , Tao Feng2 , Hangjie Yuan3 , Xiaotian Song1 and Yanan Sun1 1Sichuan University 2Tsinghua University 3Zhejiang University
Pseudocode No The paper describes methods and strategies in text but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/byyx666/Arch Craft.
Open Datasets Yes Benchmark. For the CL scenarios mentioned above, i.e., Task IL and Class IL, we assess network performance on CIFAR-100. Benchmark We choose CIFAR-100 and Imagenet-100 to evaluate the Arch Craft-guided architectures.
Dataset Splits No The paper mentions training and evaluation on a 'test set' but does not explicitly provide details for a separate validation split, such as percentages, counts, or a specific strategy for creating one.
Hardware Specification No The paper does not provide specific details about the hardware used for experiments, such as GPU models, CPU types, or memory.
Software Dependencies No The paper mentions following 'Py CIL [Zhou et al., 2023]' for training in Class IL, but it does not provide specific version numbers for Py CIL or any other software dependencies.
Experiment Setup Yes Implementation Details. For Task IL, we train the model by 60 epochs in the first task and 20 epochs in the subsequent tasks. For Class IL, we follow Py CIL [Zhou et al., 2023] to train the model by 200 epochs in the first task and 70 epochs in the subsequent tasks. In Task IL, the network is trained using a vanilla SGD optimizer, while in Class IL, a replay buffer containing 2,000 examples is employed.