Memory Efficient Continual Learning with Transformers

Authors: Beyza Ermis, Giovanni Zappella, Martin Wistuba, Aditya Rawal, Cedric Archambeau

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On both image and text classification tasks, we empirically demonstrate that our method maintains a good predictive performance without retraining the model or increasing the number of model parameters over time. The resulting model is also significantly faster at inference time compared to Adapter-based state-of-the-art methods.
Researcher Affiliation Industry Beyza Ermis Amazon Web Services ermibeyz@amazon.com Giovanni Zappella Amazon Web Services zappella@amazon.com Martin Wistuba Amazon Web Services marwistu@amazon.com Aditya Rawal Amazon Web Services adirawal@amazon.com Cédric Archambeau Amazon Web Services cedrica@amazon.com
Pseudocode Yes Algorithm 1 Adaptive Distillation of Adapters (ADA)
Open Source Code No The paper does not provide a statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes We use three text datasets for multi-label text classification: Arxiv Papers [66] (paper classification), Reuters (RCV1-V2) [29] (news classification), Wiki-30K [71] (Wikipedia article classification) and two dataset for image classification: CIFAR100 [28] and Mini Image Net [49].
Dataset Splits Yes After splitting the data in training and test set, we provide the algorithm with the training set and subsequently measure its performance on the test set.
Hardware Specification No The paper does not specify the exact hardware (e.g., specific GPU or CPU models, memory details, or cloud instance types) used for running the experiments.
Software Dependencies No The paper mentions using "Hugging Face Transformers [61]" and "Adapter-Hub [40]" but does not provide specific version numbers for these software components.
Experiment Setup Yes For all the methods, we use the same configuration for the Adapters, setting the size to 48.