BiRT: Bio-inspired Replay in Vision Transformers for Continual Learning
Authors: Kishaan Jeeveswaran, Prashant Shivaram Bhat, Bahram Zonooz, Elahe Arani
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Table 1. Results on multiple datasets learned with 10 tasks with varying buffer sizes, averaged over multiple class orders. Bi RT achieves consistent improvements over Dy Tox in different metrics, i.e. accuracy, forgetting, BWT, and FWT. The last accuracy determines the performance on past tasks after learning the last task, and the average accuracy shows the average of the last accuracy after learning every task. |
| Researcher Affiliation | Collaboration | 1Advanced Research Lab, Nav Info Europe, Netherlands 2Dep. of Mathematics and Computer Science, Eindhoven University of Technology, Netherlands. |
| Pseudocode | Yes | Algorithm 1 Bi RT Algorithm |
| Open Source Code | Yes | 1Code available at github.com/NeurAI-Lab/BiRT. |
| Open Datasets | Yes | We evaluate our approach on CIFAR-100 (Krizhevsky et al., 2009), Image Net-100 (Deng et al., 2009), and Tiny Image Net (Le and Yang, 2015). |
| Dataset Splits | Yes | Image Net-100 consists of 129k train and 5,000 validation images of size 224x224 belonging to 100 classes. |
| Hardware Specification | Yes | All models are trained on a single NVIDIA V100 GPU, and all evaluations are performed on a single NVIDIA RTX 2080 Ti GPU. |
| Software Dependencies | No | The paper mentions using the 'continuum library (Douillard and Lesort, 2021)' and building on 'Dy Tox (Douillard et al., 2021) framework', but it does not specify version numbers for these or other software components like Python, PyTorch, or TensorFlow. |
| Experiment Setup | Yes | We train models with a learning rate of 5e-4, a batch size of 128, and a weight decay of 1e-6. All models, including the baseline, are trained for 500 epochs per task in CIFAR-100 (Krizhevsky et al., 2009), Tiny Image Net (Le and Yang, 2015), and Image Net-100 (Deng et al., 2009). |