Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Pretrained Language Model in Continual Learning: A Comparative Study
Authors: Tongtong Wu, Massimo Caccia, Zhuang Li, Yuan-Fang Li, Guilin Qi, Gholamreza Haffari
ICLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experimental analyses reveal interesting performance differences across PLMs and across CL methods. We conduct experiments over (1) two primary continual learning setting, including taskincremental learning and class-incremental learning; (2) three benchmark datasets with different data distributions and task definitions, including relation extraction, event classification, and intent detection; (3) four CL approaches with six baseline methods implemented for systematic comparison; and (4) five pretrained language models. |
| Researcher Affiliation | Academia | Tongtong Wu1,2, Massimo Caccia3, Zhuang Li2, Yuan-Fang Li2, Guilin Qi1, Gholamreza Haffari 2 1Southeast University 2Monash University 3MILA |
| Pseudocode | Yes | Algorithm 1: Function of Layer Evaluation Evaluate Layer( ) |
| Open Source Code | Yes | To encourage more research on continual learning in NLP, we release the code and dataset as an open-access resource on https://github.com/wutong8023/PLM4CL. git. |
| Open Datasets | Yes | We evaluate our methods on 3 datasets with distinct label distributions, covering the following domains. CLINC150 (Larson et al., 2019) is an intent classification dataset... Maven (Wang et al., 2020) is a long-tailed event detection dataset... Web RED (Ormandi et al., 2021) is a severely long-tailed relation classification dataset... To encourage more research on continual learning in NLP, we release the code and dataset as an open-access resource on https://github.com/wutong8023/PLM4CL. git. |
| Dataset Splits | Yes | For each class, we randomly split the dataset set into train, validation and test set by 10:2:3. |
| Hardware Specification | No | The computational resources for this work were supported by the Multi-modal Australian Science S Imaging and Visualisation Environment (MASSIVE) (www.massive.org.au). |
| Software Dependencies | No | To provide a fair comparison among CL methods, we train all the networks using the Adam W Mosbach et al. (2021) optimizer, and select 10e-5 as the learning rate for all pretrained backbone models. |
| Experiment Setup | Yes | To provide a fair comparison among CL methods, we train all the networks using the Adam W Mosbach et al. (2021) optimizer, and select 10e-5 as the learning rate for all pretrained backbone models. (Table 2 lists specific hyper-parameters for EWC: "λ 1,000,000", "γ 0.2". For ER: "buffer size 200". For HAT: "smax 400". For DERPP: "α 0.5", "β 1".) |