Learn more, but bother less: parameter efficient continual learning

Authors: Fuli Qiao, Mehrdad Mahdavi

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results on continual learning benchmarks validate the efficacy of our proposed method, which outperforms existing state-of-the-art methods in reducing forgetting, enhancing task performance, and preserving the model s ability to generalize to unseen tasks.
Researcher Affiliation Academia Fuli Qiao Pennsylvania State University fvq5015@psu.edu Mehrdad Mahdavi Pennsylvania State University mzm616@psu.edu
Pseudocode No The paper does not include pseudocode or clearly labeled algorithm blocks. Figure 2 is a framework overview diagram, not pseudocode.
Open Source Code No The NeurIPS checklist states: "Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: We use open-source datasets and models, but do not attach the code."
Open Datasets Yes We evaluate our approach using a CL benchmark specifically designed for language models. This benchmark comprises five text classification datasets: AG News, Amazon Reviews, Yelp Reviews, DBpedia, and Yahoo Answers, as introduced by [51]... Table 6: The details of 15 datasets utilized in our continual learning (CL) experiments, including the evaluation metrics used for assessment. Our selection encompasses datasets from established benchmarks:: the standard CL benchmark [51], GLUE [42], and Super GLUE benchmarks [41], and added IMDB movie reviews dataset.
Dataset Splits Yes For each task, we train using 1000 randomly selected samples and validate using 500 samples per class, following the methodology of [35].
Hardware Specification Yes All our experiments involving T5 models were performed on a server outfitted with four NVIDIA A6000 GPUs, utilizing the Deep Speed repository for implementation.
Software Dependencies No The paper mentions "utilizing the Deep Speed repository" but does not specify a version number for Deep Speed or any other software dependencies.
Experiment Setup Yes For every sequence of tasks across different orders, we standardized our experimental setup as follows: A constant rate of 1e-3 was maintained throughout the experiments. We used a total batch size of 32, distributed as 8 per GPU to leverage the computational capabilities of all four A6000 GPUs efficiently. We set the dropout rate at 0.1. We applied a regularization rate of 0.1 to the orthogonal matrices derived from the Singular Value Decomposition (SVD). A rate of 0.0 was employed, indicating no additional penalty on the model s weights during training.