Looking Back on Learned Experiences For Class/task Incremental Learning
Authors: Mozhgan PourKeshavarzi, Guoying Zhao, Mohammad Sabokrou
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Compared to the state-of-the-art data techniques without buffering past data samples, CF-IL demonstrates significantly better performance on the well-known datasets whether a task oracle is available in test time (Task-IL) or not (Class-IL)1. and Table 2 shows the performance of CF-IL in comparison to the other mentioned methods on the two commonly used considered datasets. Our proposed method CF-IL has achieved the SOTA performance in almost all settings. |
| Researcher Affiliation | Academia | Mozhgan Pour Keshavarz1 Guoying Zhao2 Mohammad Sabokrou1 1School of Computer Science, Institute for Research in Fundamental Sciences (IPM) 2Center for Machine Vision and Signal Analysis, University of Oulu, Finland |
| Pseudocode | Yes | Algorithm 1: Cost-Free Incremental Learning and Algorithm 2: Mem. Recovery Paradigm |
| Open Source Code | Yes | 1The code is available at https://github.com/Mozhgan Pour Keshavarz/Cost-Free-Incremental-Learning |
| Open Datasets | Yes | We evaluate our method on two commonly-used datasets for incremental image classification tasks: CIFAR-10 (Krizhevsky et al., 2009) and Tiny-Image Net (Le & Yang, 2015). |
| Dataset Splits | Yes | Following the convention of the ML community, hyperparams are selected by performing a grid-search on a validation set, obtained by sampling 10% of the training set. |
| Hardware Specification | No | The authors would like to thank Part Research Center (Partdp.ai) for contributing to the hardware infrastructure we used for our experiments. This statement is too general and does not provide specific hardware details (e.g., GPU/CPU models, memory) used for the experiments. |
| Software Dependencies | No | The paper mentions optimizers like Stochastic Gradient Descent (SGD) optimizer and Adam optimizer, but does not specify any software versions for libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used in the implementation. |
| Experiment Setup | Yes | We train the networks using the Stochastic Gradient Descent (SGD) optimizer with a learning rate of 0.1 with other parameters set to their default values. We consider the number of epochs per task concerning the dataset complexity; thus, we set it to 50 for Sequential CIFAR-10 and 100 for Sequential Tiny-Image Net, respectively. In the training phase, we select a batch of data from the incremented task and a minibatch of data from the transfer set S, where depending on the hardware restriction, we set both to 32. and In the MRP, we consider the temperature value τ to 20 for the distillation purpose. In the Dirichlet distribution, we set β in [1, 0.1] for each dataset... The η value in the constraint step is empirically valued at 0.7. To optimize the random noisy image, we employ the Adam optimizer with a learning rate of 0.01, while the maximum number of iterations is set to 1500. |