reproducibilityindex.ai

Continual evaluation for lifelong learning: Identifying the stability gap

Authors: Matthias De Lange, Gido M van de Ven, Tinne Tuytelaars

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically we show that experience replay, constraintbased replay, knowledge-distillation, and parameter regularization methods are all prone to the stability gap; and that the stability gap can be observed in class-, task-, and domain-incremental learning benchmarks. Additionally, a controlled experiment shows that the stability gap increases when tasks are more dissimilar. Finally, by disentangling gradients into plasticity and stability components, we propose a conceptual explanation for the stability gap.
Researcher Affiliation	Academia	Matthias De Lange, Gido M. van de Ven & Tinne Tuytelaars KU Leuven
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Contributions in this work are along three main lines, with code publicly available.1 ... 1Code: https://github.com/mattdl/Continual Evaluation
Open Datasets	Yes	For experiments on class-incremental learning we use three standard datasets: MNIST (Le Cun & Cortes, 2010) consists of grayscale handwritten digits, CIFAR10 (Krizhevsky et al., 2009) contains images from a range of vehicles and animals, and Mini Imagenet (Vinyals et al., 2016) is a subset of Imagenet (Russakovsky et al., 2015). ... For domain-incremental learning we consider drastic domain changes in Mini-Domain Net (Zhou et al., 2021), a scaled-down subset of 126 classes of Domain Net (Peng et al., 2019)... Synthetic Speech Commands dataset (Buchner, 2017)
Dataset Splits	Yes	To make sure our worst-case analysis applies to the best-case configuration for ER, we run a gridsearch over different hyperparameters and select the entry with the highest stability-plasticity trade-off metric ACC on the held-out evaluation data (Lopez-Paz & Ranzato, 2017).
Hardware Specification	No	All results were performed on a compute cluster with a range of NVIDIA GPU s.
Software Dependencies	No	The experiments were based on the Avalanche framework (Lomonaco et al., 2021) in Pytorch (Paszke et al., 2019). The versions for Avalanche and Pytorch are not specified.
Experiment Setup	Yes	Setup. We employ continual evaluation with evaluation periodicity in range ρeval {1, 10, 102, 103} and subset size 1k per evaluation task... Split-MNIST uses an MLP with 2 hidden layers of 400 units. Split CIFAR10, Split-Mini Imagenet and Mini-Domain Net use a slim version of Resnet18 (Lopez-Paz & Ranzato, 2017). SGD optimization is used with 0.9 momentum. For all experiments, the learning rate η for the gradient-based updates is considered as hyperparameter in the set η {0.1, 0.01, 0.001, 0.0001}. A fixed batch size is used for all benchmarks, with 128 for the larger-scale Split-Mini Imagenet and Mini-Domain Net, and 256 for the smaller Split-MNIST and Split-CIFAR10. ...We indicate the selected hyperparameters (η, α, \|M\|) per dataset here: Split-MNIST (0.01, 0.3, 2 103), Split-CIFAR10 (0.1, 0.7, 103), Split-Mini Imagenet (0.1, 0.5, 104), Mini-Domain Net (0.1, 0.3, 103).