Continual Learning through Retrieval and Imagination

Authors: Zhen Wang, Liu Liu, Yiqun Duan, Dacheng Tao8594-8602

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that DRI performs significantly better than the existing state-of-the-art continual learning methods and effectively alleviates catastrophic forgetting. 5 Experiments 5.1 Experimental Setup We consider a strict evaluation setting (Hsu et al. 2018), which models the sequence of tasks following three scenarios: Task Incremental Learning (Task-IL) splits the training samples into partitions of tasks, which requires task identities to select corresponding classifiers at inference time; Class Incremental Learning (Class-IL) sequentially increases the number of classes to be classified without requiring the task identities, as the hardest scenario (van de Ven et al. 2018); Domain Incremental Learning (Domain-IL) observes the same classes during each task, but the input-distribution is continuously changing; task identities remains unknown. Datasets. We experiment with the following datasets: Split MNIST: the MNIST benchmark (Le Cun et al. 1998) is split into 5 tasks by grouping together 2 classes. Split CIFAR-10: splitting CIFAR-10 (Krizhevsky et al. 2009) in 5 tasks, each of which introduces 2 classes. Split Tiny-Image Net: Tiny-Image Net (Stanford 2015) has 100,000 images across 200 classes. Each task consists of 20 disjoint subset of classes from these 200 classes.
Researcher Affiliation Collaboration Zhen Wang1, Liu Liu1, Yiqun Duan2, Dacheng Tao3,1 1The University of Sydney, Australia, 2University of Technology Sydney, Australia, 3JD Explore Academy, China zwan4121@uni.sydney.edu.au, liu.liu1@sydney.edu.au, yiqun.duan@student.uts.edu.au, dacheng.tao@gmail.com
Pseudocode Yes Algorithm 1: Deep Retrieval and Imagination (DRI) Input: continuum dataset D, memory capacity K Require: parameters θ, IGAN, scalars α and β, learning rate η M {} Initialize memory with empty set for t = 1, ..., T do θpre θ for (x, y) in Dt do (x , y ) sample(M) (x a, y a) (IGANg(x ), y ) (x , y ) α fθ(x a) fθpre(x a) 2 2 + β ℓ(θ; x a, y a) (xb, yb) rebalance((x, y), (x a, y a)) θ θ η θ[ℓ(θ; xb, yb) + ] Section 3.2 end for IGAN update IGAN(IGAN; Dt, M) Section 3.3 M update Memory(M; Dt, θ, K) Eq. (8) end for
Open Source Code No The paper does not contain an explicit statement about releasing code or a link to a code repository.
Open Datasets Yes Datasets. We experiment with the following datasets: Split MNIST: the MNIST benchmark (Le Cun et al. 1998) is split into 5 tasks by grouping together 2 classes. Split CIFAR-10: splitting CIFAR-10 (Krizhevsky et al. 2009) in 5 tasks, each of which introduces 2 classes. Split Tiny-Image Net: Tiny-Image Net (Stanford 2015) has 100,000 images across 200 classes. Each task consists of 20 disjoint subset of classes from these 200 classes.
Dataset Splits Yes We select the hyper-parameters by performing a grid search on the validation set which is obtained by sampling 10% of the training set.
Hardware Specification No The paper does not provide specific details about the hardware used for experiments (e.g., GPU models, CPU specifications).
Software Dependencies No The paper mentions using "the stochastic gradient descent (SGD) optimizer" and "Res Net18 (He et al. 2016)" but does not specify any software names with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup No The paper mentions hyperparameters are selected via grid search on the validation set, and that models are trained with SGD, but does not explicitly list specific values for learning rate, batch size, epochs, or other detailed training configurations in the main text.