Continual Learning through Retrieval and Imagination
Authors: Zhen Wang, Liu Liu, Yiqun Duan, Dacheng Tao8594-8602
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that DRI performs significantly better than the existing state-of-the-art continual learning methods and effectively alleviates catastrophic forgetting. 5 Experiments 5.1 Experimental Setup We consider a strict evaluation setting (Hsu et al. 2018), which models the sequence of tasks following three scenarios: Task Incremental Learning (Task-IL) splits the training samples into partitions of tasks, which requires task identities to select corresponding classifiers at inference time; Class Incremental Learning (Class-IL) sequentially increases the number of classes to be classified without requiring the task identities, as the hardest scenario (van de Ven et al. 2018); Domain Incremental Learning (Domain-IL) observes the same classes during each task, but the input-distribution is continuously changing; task identities remains unknown. Datasets. We experiment with the following datasets: Split MNIST: the MNIST benchmark (Le Cun et al. 1998) is split into 5 tasks by grouping together 2 classes. Split CIFAR-10: splitting CIFAR-10 (Krizhevsky et al. 2009) in 5 tasks, each of which introduces 2 classes. Split Tiny-Image Net: Tiny-Image Net (Stanford 2015) has 100,000 images across 200 classes. Each task consists of 20 disjoint subset of classes from these 200 classes. |
| Researcher Affiliation | Collaboration | Zhen Wang1, Liu Liu1, Yiqun Duan2, Dacheng Tao3,1 1The University of Sydney, Australia, 2University of Technology Sydney, Australia, 3JD Explore Academy, China zwan4121@uni.sydney.edu.au, liu.liu1@sydney.edu.au, yiqun.duan@student.uts.edu.au, dacheng.tao@gmail.com |
| Pseudocode | Yes | Algorithm 1: Deep Retrieval and Imagination (DRI) Input: continuum dataset D, memory capacity K Require: parameters θ, IGAN, scalars α and β, learning rate η M {} Initialize memory with empty set for t = 1, ..., T do θpre θ for (x, y) in Dt do (x , y ) sample(M) (x a, y a) (IGANg(x ), y ) (x , y ) α fθ(x a) fθpre(x a) 2 2 + β ℓ(θ; x a, y a) (xb, yb) rebalance((x, y), (x a, y a)) θ θ η θ[ℓ(θ; xb, yb) + ] Section 3.2 end for IGAN update IGAN(IGAN; Dt, M) Section 3.3 M update Memory(M; Dt, θ, K) Eq. (8) end for |
| Open Source Code | No | The paper does not contain an explicit statement about releasing code or a link to a code repository. |
| Open Datasets | Yes | Datasets. We experiment with the following datasets: Split MNIST: the MNIST benchmark (Le Cun et al. 1998) is split into 5 tasks by grouping together 2 classes. Split CIFAR-10: splitting CIFAR-10 (Krizhevsky et al. 2009) in 5 tasks, each of which introduces 2 classes. Split Tiny-Image Net: Tiny-Image Net (Stanford 2015) has 100,000 images across 200 classes. Each task consists of 20 disjoint subset of classes from these 200 classes. |
| Dataset Splits | Yes | We select the hyper-parameters by performing a grid search on the validation set which is obtained by sampling 10% of the training set. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for experiments (e.g., GPU models, CPU specifications). |
| Software Dependencies | No | The paper mentions using "the stochastic gradient descent (SGD) optimizer" and "Res Net18 (He et al. 2016)" but does not specify any software names with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | No | The paper mentions hyperparameters are selected via grid search on the validation set, and that models are trained with SGD, but does not explicitly list specific values for learning rate, batch size, epochs, or other detailed training configurations in the main text. |