Theory on Forgetting and Generalization of Continual Learning

Authors: Sen Lin, Peizhong Ju, Yingbin Liang, Ness Shroff

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental More interestingly, by conducting experiments on real datasets using deep neural networks (DNNs), we show that some of these insights even go beyond the linear models and can be carried over to practical setups.
Researcher Affiliation Academia 1Department of ECE, 2Department of CSE, The Ohio State University, Columbus OH, USA. Correspondence to: Sen Lin <lin.4282@osu.edu>, Peizhong Ju <ju.171@osu.edu>.
Pseudocode No The paper describes procedures in text and equations but does not include explicit pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statement about making its code open-source or any links to a code repository for the methodology described.
Open Datasets Yes We conduct experiments on MNIST (Le Cun et al., 1989) using a convolutional neural network... We consider two standard benchmarks in CL: (1) PMNIST... (2) Split CIFAR-100 (Krizhevsky et al., 2009)
Dataset Splits No For each task, we randomly select 200 samples for training and 1000 samples for testing. ... For Split CIFAR-100, we use a version of 5-layer Alex Net, and train the network for a maximum of 200 epochs with early stopping for each task. The paper mentions 'early stopping' which implies a validation set, but it does not specify the size or method for splitting a validation set from the data.
Hardware Specification No The paper does not specify any hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions using SGD and specific network architectures but does not provide specific version numbers for any software dependencies like programming languages or libraries.
Experiment Setup Yes We learn each task by using SGD with a learning rate of 0.1 for 600 epochs. ... we use a 3-layer fully-connected network with 2 hidden layer of 100 units for PMNIST, and train the network for 5 epochs with a batch size of 10 for each task. For Split CIFAR-100, we use a version of 5-layer Alex Net, and train the network for a maximum of 200 epochs with early stopping for each task.