Learn more, but bother less: parameter efficient continual learning
Authors: Fuli Qiao, Mehrdad Mahdavi
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results on continual learning benchmarks validate the efficacy of our proposed method, which outperforms existing state-of-the-art methods in reducing forgetting, enhancing task performance, and preserving the model s ability to generalize to unseen tasks. |
| Researcher Affiliation | Academia | Fuli Qiao Pennsylvania State University fvq5015@psu.edu Mehrdad Mahdavi Pennsylvania State University mzm616@psu.edu |
| Pseudocode | No | The paper does not include pseudocode or clearly labeled algorithm blocks. Figure 2 is a framework overview diagram, not pseudocode. |
| Open Source Code | No | The NeurIPS checklist states: "Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: We use open-source datasets and models, but do not attach the code." |
| Open Datasets | Yes | We evaluate our approach using a CL benchmark specifically designed for language models. This benchmark comprises five text classification datasets: AG News, Amazon Reviews, Yelp Reviews, DBpedia, and Yahoo Answers, as introduced by [51]... Table 6: The details of 15 datasets utilized in our continual learning (CL) experiments, including the evaluation metrics used for assessment. Our selection encompasses datasets from established benchmarks:: the standard CL benchmark [51], GLUE [42], and Super GLUE benchmarks [41], and added IMDB movie reviews dataset. |
| Dataset Splits | Yes | For each task, we train using 1000 randomly selected samples and validate using 500 samples per class, following the methodology of [35]. |
| Hardware Specification | Yes | All our experiments involving T5 models were performed on a server outfitted with four NVIDIA A6000 GPUs, utilizing the Deep Speed repository for implementation. |
| Software Dependencies | No | The paper mentions "utilizing the Deep Speed repository" but does not specify a version number for Deep Speed or any other software dependencies. |
| Experiment Setup | Yes | For every sequence of tasks across different orders, we standardized our experimental setup as follows: A constant rate of 1e-3 was maintained throughout the experiments. We used a total batch size of 32, distributed as 8 per GPU to leverage the computational capabilities of all four A6000 GPUs efficiently. We set the dropout rate at 0.1. We applied a regularization rate of 0.1 to the orthogonal matrices derived from the Singular Value Decomposition (SVD). A rate of 0.0 was employed, indicating no additional penalty on the model s weights during training. |