Gradient Projection Memory for Continual Learning
Authors: Gobinda Saha, Isha Garg, Kaushik Roy
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our algorithm on diverse image classification datasets with short and long sequences of tasks and report better or on-par performance compared to the state-of-the-art approaches.1 |
| Researcher Affiliation | Academia | Gobinda Saha, Isha Garg & Kaushik Roy School of Electrical and Computer Engineering, Purdue University gsaha@purdue.edu, gargi@purdue.edu, kaushik@purdue.edu |
| Pseudocode | Yes | The pseudo-code of the algorithm is given in Algorithm 1 in the appendix. Algorithm 1 Algorithm for Continual Learning with GPM |
| Open Source Code | Yes | Our code is available at https://github.com/sahagobinda/GPM |
| Open Datasets | Yes | Datasets: We evaluate our continual learning algorithm on Permuted MNIST (PMNIST) (Lecun et al., 1998), 10-Split CIFAR-100 (Krizhevsky, 2009), 20-Spilt mini Image Net (Vinyals et al., 2016) and sequence of 5-Datasets (Ebrahimi et al., 2020b). |
| Dataset Splits | Yes | For PMNIST, we create 10 sequential tasks using different permutations where each task has 10 classes (Ebrahimi et al., 2020a). The 10-Split CIFAR-100 is constructed by splitting 100 classes of CIFAR-100 into 10 tasks with 10 classes per task. Whereas, 20-Spilt mini Image Net, used in (Chaudhry et al., 2019a), is constructed by splitting 100 classes of mini Image Net into 20 sequential tasks where each task has 5 classes. Finally, we use a sequence of 5-Datasets including CIFAR10, MNIST, SVHN (Netzer et al., 2011), not MNIST (Bulatov, 2011) and Fashion MNIST (Xiao et al., 2017), where classification on each dataset is considered as a task. In our experiments we do not use any data augmentation. The dataset statistics are given in Table 4 & 5 in the appendix. |
| Hardware Specification | Yes | We measured per epoch training times (in Figure 2(b)) for computation in NVIDIA Ge Force GTX 1060 GPU. For ten sequential tasks in PMNIST experiment, we computed per epoch training time for each task and reported the average value over all the tasks. Training time for different algorithms reported in Table 2(a) for PMNIST tasks were measured on a Single NVIDIA Ge Force GTX 1060 GPU. For all the other datasets, training time for different algorithms reported in Table 2(b) were measured on a Single NVIDIA Ge Force GTX 1080 Ti GPU. |
| Software Dependencies | No | The paper mentions implementing models and using libraries (e.g., "EWC and HAT are implemented from the official implementation provided by Serr a et al. (2018)"), but it does not specify exact version numbers for programming languages, frameworks (like PyTorch or TensorFlow), or other key software components. |
| Experiment Setup | Yes | Training Details: We train all the models with plain stochastic gradient descent (SGD). For each task in PMNIST and split mini Image Net we train the network for 5 and 10 epochs respectively with batch size of 10. In Split CIFAR-100 and 5-Datasets experiments, we train each task for maximum of 200 and 100 epochs respectively with the early termination strategy based on the validation loss as proposed in Serr a et al. (2018). For both datasets, batch size is set to 64. For GEM, A-GEM and ER Res the episodic memory size is chosen to be approximately the same size as the maximum GPM size (GPM Max). |