reproducibilityindex.ai

Gradient Projection Memory for Continual Learning

Authors: Gobinda Saha, Isha Garg, Kaushik Roy

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our algorithm on diverse image classiﬁcation datasets with short and long sequences of tasks and report better or on-par performance compared to the state-of-the-art approaches.1
Researcher Affiliation	Academia	Gobinda Saha, Isha Garg & Kaushik Roy School of Electrical and Computer Engineering, Purdue University gsaha@purdue.edu, gargi@purdue.edu, kaushik@purdue.edu
Pseudocode	Yes	The pseudo-code of the algorithm is given in Algorithm 1 in the appendix. Algorithm 1 Algorithm for Continual Learning with GPM
Open Source Code	Yes	Our code is available at https://github.com/sahagobinda/GPM
Open Datasets	Yes	Datasets: We evaluate our continual learning algorithm on Permuted MNIST (PMNIST) (Lecun et al., 1998), 10-Split CIFAR-100 (Krizhevsky, 2009), 20-Spilt mini Image Net (Vinyals et al., 2016) and sequence of 5-Datasets (Ebrahimi et al., 2020b).
Dataset Splits	Yes	For PMNIST, we create 10 sequential tasks using different permutations where each task has 10 classes (Ebrahimi et al., 2020a). The 10-Split CIFAR-100 is constructed by splitting 100 classes of CIFAR-100 into 10 tasks with 10 classes per task. Whereas, 20-Spilt mini Image Net, used in (Chaudhry et al., 2019a), is constructed by splitting 100 classes of mini Image Net into 20 sequential tasks where each task has 5 classes. Finally, we use a sequence of 5-Datasets including CIFAR10, MNIST, SVHN (Netzer et al., 2011), not MNIST (Bulatov, 2011) and Fashion MNIST (Xiao et al., 2017), where classiﬁcation on each dataset is considered as a task. In our experiments we do not use any data augmentation. The dataset statistics are given in Table 4 & 5 in the appendix.
Hardware Specification	Yes	We measured per epoch training times (in Figure 2(b)) for computation in NVIDIA Ge Force GTX 1060 GPU. For ten sequential tasks in PMNIST experiment, we computed per epoch training time for each task and reported the average value over all the tasks. Training time for different algorithms reported in Table 2(a) for PMNIST tasks were measured on a Single NVIDIA Ge Force GTX 1060 GPU. For all the other datasets, training time for different algorithms reported in Table 2(b) were measured on a Single NVIDIA Ge Force GTX 1080 Ti GPU.
Software Dependencies	No	The paper mentions implementing models and using libraries (e.g., "EWC and HAT are implemented from the ofﬁcial implementation provided by Serr a et al. (2018)"), but it does not specify exact version numbers for programming languages, frameworks (like PyTorch or TensorFlow), or other key software components.
Experiment Setup	Yes	Training Details: We train all the models with plain stochastic gradient descent (SGD). For each task in PMNIST and split mini Image Net we train the network for 5 and 10 epochs respectively with batch size of 10. In Split CIFAR-100 and 5-Datasets experiments, we train each task for maximum of 200 and 100 epochs respectively with the early termination strategy based on the validation loss as proposed in Serr a et al. (2018). For both datasets, batch size is set to 64. For GEM, A-GEM and ER Res the episodic memory size is chosen to be approximately the same size as the maximum GPM size (GPM Max).