Memory Efficient Continual Learning with Transformers
Authors: Beyza Ermis, Giovanni Zappella, Martin Wistuba, Aditya Rawal, Cedric Archambeau
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On both image and text classification tasks, we empirically demonstrate that our method maintains a good predictive performance without retraining the model or increasing the number of model parameters over time. The resulting model is also significantly faster at inference time compared to Adapter-based state-of-the-art methods. |
| Researcher Affiliation | Industry | Beyza Ermis Amazon Web Services ermibeyz@amazon.com Giovanni Zappella Amazon Web Services zappella@amazon.com Martin Wistuba Amazon Web Services marwistu@amazon.com Aditya Rawal Amazon Web Services adirawal@amazon.com Cédric Archambeau Amazon Web Services cedrica@amazon.com |
| Pseudocode | Yes | Algorithm 1 Adaptive Distillation of Adapters (ADA) |
| Open Source Code | No | The paper does not provide a statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We use three text datasets for multi-label text classification: Arxiv Papers [66] (paper classification), Reuters (RCV1-V2) [29] (news classification), Wiki-30K [71] (Wikipedia article classification) and two dataset for image classification: CIFAR100 [28] and Mini Image Net [49]. |
| Dataset Splits | Yes | After splitting the data in training and test set, we provide the algorithm with the training set and subsequently measure its performance on the test set. |
| Hardware Specification | No | The paper does not specify the exact hardware (e.g., specific GPU or CPU models, memory details, or cloud instance types) used for running the experiments. |
| Software Dependencies | No | The paper mentions using "Hugging Face Transformers [61]" and "Adapter-Hub [40]" but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | For all the methods, we use the same configuration for the Adapters, setting the size to 48. |