reproducibilityindex.ai

Can BERT Refrain from Forgetting on Sequential Tasks? A Probing Study

Authors: Mingxu Tao, Yansong Feng, Dongyan Zhao

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments reveal that BERT can actually generate high quality representations for previously learned tasks in a long term, under extremely sparse replay or even no replay. We further introduce a series of novel methods to interpret the mechanism of forgetting and how memory rehearsal plays a significant role in task incremental learning, which bridges the gap between our new discovery and previous studies about catastrophic forgetting1.
Researcher Affiliation	Academia	Mingxu Tao1,2, Yansong Feng1,3, Dongyan Zhao1,2 1Wangxuan Institute of Computer Technology, Peking University, China 2Center for Data Science, Peking University, China 3The MOE Key Laboratory of Computational Linguistics, Peking University, China {thomastao, fengyansong, zhaody}@pku.edu.cn
Pseudocode	Yes	Algorithm 1: Calculating the Representation Cone
Open Source Code	No	Code will be released at https://github.com/kobayashikanna01/plms_are_lifelong_learners
Open Datasets	Yes	Its text classification part is rearranged from five datasets used by Zhang et al. (2015), consisting of 4 text classification tasks: news classification (AGNews, 4 classes), ontology prediction (DBPedia, 14 classes), sentiment analysis (Amazon and Yelp, 5 shared classes), topic classification (Yahoo, 10 classes). ... As for question answering, this benchmark contains 3 datasets: SQu AD 1.1 (Rajpurkar et al., 2016), Trivia QA (Joshi et al., 2017), and Qu AC (Choi et al., 2018).
Dataset Splits	No	The paper mentions training and testing examples but does not explicitly describe a separate validation set or its split.
Hardware Specification	No	The paper does not provide specific details about the hardware used for the experiments, such as GPU models, CPU types, or memory.
Software Dependencies	No	The paper does not provide specific version numbers for any software libraries, frameworks, or programming languages used in the experiments.
Experiment Setup	Yes	To compare with prior works (d Autume et al., 2019; Wang et al., 2020b), we retain consistent experimental setups with them,where the maximum length of tokens and batch size are set to 128 and 32, separately. ... We employ Adam (Kingma & Ba, 2015) as the optimizer. ... On each task, the model is finetuned for 15K steps... We set batch size as 16 and learning rate as 3 10 5 without decay.