reproducibilityindex.ai

Pretrained Language Model in Continual Learning: A Comparative Study

Authors: Tongtong Wu, Massimo Caccia, Zhuang Li, Yuan-Fang Li, Guilin Qi, Gholamreza Haffari

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our extensive experimental analyses reveal interesting performance differences across PLMs and across CL methods. We conduct experiments over (1) two primary continual learning setting, including taskincremental learning and class-incremental learning; (2) three benchmark datasets with different data distributions and task deﬁnitions, including relation extraction, event classiﬁcation, and intent detection; (3) four CL approaches with six baseline methods implemented for systematic comparison; and (4) ﬁve pretrained language models.
Researcher Affiliation	Academia	Tongtong Wu1,2, Massimo Caccia3, Zhuang Li2, Yuan-Fang Li2, Guilin Qi1, Gholamreza Haffari 2 1Southeast University 2Monash University 3MILA
Pseudocode	Yes	Algorithm 1: Function of Layer Evaluation Evaluate Layer( )
Open Source Code	Yes	To encourage more research on continual learning in NLP, we release the code and dataset as an open-access resource on https://github.com/wutong8023/PLM4CL. git.
Open Datasets	Yes	We evaluate our methods on 3 datasets with distinct label distributions, covering the following domains. CLINC150 (Larson et al., 2019) is an intent classiﬁcation dataset... Maven (Wang et al., 2020) is a long-tailed event detection dataset... Web RED (Ormandi et al., 2021) is a severely long-tailed relation classiﬁcation dataset... To encourage more research on continual learning in NLP, we release the code and dataset as an open-access resource on https://github.com/wutong8023/PLM4CL. git.
Dataset Splits	Yes	For each class, we randomly split the dataset set into train, validation and test set by 10:2:3.
Hardware Specification	No	The computational resources for this work were supported by the Multi-modal Australian Science S Imaging and Visualisation Environment (MASSIVE) (www.massive.org.au).
Software Dependencies	No	To provide a fair comparison among CL methods, we train all the networks using the Adam W Mosbach et al. (2021) optimizer, and select 10e-5 as the learning rate for all pretrained backbone models.
Experiment Setup	Yes	To provide a fair comparison among CL methods, we train all the networks using the Adam W Mosbach et al. (2021) optimizer, and select 10e-5 as the learning rate for all pretrained backbone models. (Table 2 lists specific hyper-parameters for EWC: "λ 1,000,000", "γ 0.2". For ER: "buffer size 200". For HAT: "smax 400". For DERPP: "α 0.5", "β 1".)