Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Memory Efficient Continual Learning with Transformers
Authors: Beyza Ermis, Giovanni Zappella, Martin Wistuba, Aditya Rawal, Cedric Archambeau
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On both image and text classification tasks, we empirically demonstrate that our method maintains a good predictive performance without retraining the model or increasing the number of model parameters over time. The resulting model is also significantly faster at inference time compared to Adapter-based state-of-the-art methods. |
| Researcher Affiliation | Industry | Beyza Ermis Amazon Web Services EMAIL Giovanni Zappella Amazon Web Services EMAIL Martin Wistuba Amazon Web Services EMAIL Aditya Rawal Amazon Web Services EMAIL Cédric Archambeau Amazon Web Services EMAIL |
| Pseudocode | Yes | Algorithm 1 Adaptive Distillation of Adapters (ADA) |
| Open Source Code | No | The paper does not provide a statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We use three text datasets for multi-label text classification: Arxiv Papers [66] (paper classification), Reuters (RCV1-V2) [29] (news classification), Wiki-30K [71] (Wikipedia article classification) and two dataset for image classification: CIFAR100 [28] and Mini Image Net [49]. |
| Dataset Splits | Yes | After splitting the data in training and test set, we provide the algorithm with the training set and subsequently measure its performance on the test set. |
| Hardware Specification | No | The paper does not specify the exact hardware (e.g., specific GPU or CPU models, memory details, or cloud instance types) used for running the experiments. |
| Software Dependencies | No | The paper mentions using "Hugging Face Transformers [61]" and "Adapter-Hub [40]" but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | For all the methods, we use the same configuration for the Adapters, setting the size to 48. |