reproducibilityindex.ai

LAMOL: LAnguage MOdeling for Lifelong Language Learning

Authors: Fan-Keng Sun*, Cheng-Hao Ho*, Hung-Yi Lee

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The results show that LAMOL prevents catastrophic forgetting without any sign of intransigence and can perform ﬁve very different language tasks sequentially with only one model. Overall, LAMOL outperforms previous methods by a considerable margin and is only 2 3% worse than multitasking, which is usually considered the LLL upper bound.
Researcher Affiliation	Academia	Fan-Keng Sun MIT Cambridge, MA, USA fankeng@mit.edu Cheng-Hao Ho National Taiwan University Taipei, Taiwan jojotenya@gmail.com Hung-Yi Lee National Taiwan University Taipei, Taiwan hungyilee@ntu.edu.tw
Pseudocode	No	No pseudocode or clearly labeled algorithm block was found in the paper.
Open Source Code	Yes	The source code is available at https://github.com/jojotenya/LAMOL.
Open Datasets	Yes	Question Answering Stanford Question Answering Dataset (SQu AD) (Rajpurkar et al., 2016)... Semantic Parsing Wiki SQL (Zhong et al., 2017)... Sentiment Analysis Stanford Sentiment Treebank (SST, binary version) (Radford et al., 2017)... Semantic Role Labeling QA-SRL (He et al., 2017)... Goal-Oriented Dialogue English Wizard of Oz (WOZ) (Wen et al., 2016)... The dataset collected by Xiang Zhang (2015) is available at http://goo.gl/Jy Cn Zq.
Dataset Splits	No	Note that this work uses no development set, only the training and test datasets are shown.
Hardware Specification	No	No specific hardware details (like GPU/CPU models, memory) were provided for the experiments.
Software Dependencies	No	No specific software versions (e.g., library names with version numbers like PyTorch 1.9) were provided.
Experiment Setup	Yes	All methods use the smallest pre-trained GPT-2 model (Radford et al., 2019)1 as the LM. Each task is trained for nine epochs; greedy decoding is applied during inference. LAMOL In all experiments, k = 20 in top-k sampling and λ = 0.25 for weight of the LM loss are set.