Theory on Forgetting and Generalization of Continual Learning
Authors: Sen Lin, Peizhong Ju, Yingbin Liang, Ness Shroff
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | More interestingly, by conducting experiments on real datasets using deep neural networks (DNNs), we show that some of these insights even go beyond the linear models and can be carried over to practical setups. |
| Researcher Affiliation | Academia | 1Department of ECE, 2Department of CSE, The Ohio State University, Columbus OH, USA. Correspondence to: Sen Lin <lin.4282@osu.edu>, Peizhong Ju <ju.171@osu.edu>. |
| Pseudocode | No | The paper describes procedures in text and equations but does not include explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement about making its code open-source or any links to a code repository for the methodology described. |
| Open Datasets | Yes | We conduct experiments on MNIST (Le Cun et al., 1989) using a convolutional neural network... We consider two standard benchmarks in CL: (1) PMNIST... (2) Split CIFAR-100 (Krizhevsky et al., 2009) |
| Dataset Splits | No | For each task, we randomly select 200 samples for training and 1000 samples for testing. ... For Split CIFAR-100, we use a version of 5-layer Alex Net, and train the network for a maximum of 200 epochs with early stopping for each task. The paper mentions 'early stopping' which implies a validation set, but it does not specify the size or method for splitting a validation set from the data. |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using SGD and specific network architectures but does not provide specific version numbers for any software dependencies like programming languages or libraries. |
| Experiment Setup | Yes | We learn each task by using SGD with a learning rate of 0.1 for 600 epochs. ... we use a 3-layer fully-connected network with 2 hidden layer of 100 units for PMNIST, and train the network for 5 epochs with a batch size of 10 for each task. For Split CIFAR-100, we use a version of 5-layer Alex Net, and train the network for a maximum of 200 epochs with early stopping for each task. |