LAMOL: LAnguage MOdeling for Lifelong Language Learning

Authors: Fan-Keng Sun*, Cheng-Hao Ho*, Hung-Yi Lee

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The results show that LAMOL prevents catastrophic forgetting without any sign of intransigence and can perform five very different language tasks sequentially with only one model. Overall, LAMOL outperforms previous methods by a considerable margin and is only 2 3% worse than multitasking, which is usually considered the LLL upper bound.
Researcher Affiliation Academia Fan-Keng Sun MIT Cambridge, MA, USA fankeng@mit.edu Cheng-Hao Ho National Taiwan University Taipei, Taiwan jojotenya@gmail.com Hung-Yi Lee National Taiwan University Taipei, Taiwan hungyilee@ntu.edu.tw
Pseudocode No No pseudocode or clearly labeled algorithm block was found in the paper.
Open Source Code Yes The source code is available at https://github.com/jojotenya/LAMOL.
Open Datasets Yes Question Answering Stanford Question Answering Dataset (SQu AD) (Rajpurkar et al., 2016)... Semantic Parsing Wiki SQL (Zhong et al., 2017)... Sentiment Analysis Stanford Sentiment Treebank (SST, binary version) (Radford et al., 2017)... Semantic Role Labeling QA-SRL (He et al., 2017)... Goal-Oriented Dialogue English Wizard of Oz (WOZ) (Wen et al., 2016)... The dataset collected by Xiang Zhang (2015) is available at http://goo.gl/Jy Cn Zq.
Dataset Splits No Note that this work uses no development set, only the training and test datasets are shown.
Hardware Specification No No specific hardware details (like GPU/CPU models, memory) were provided for the experiments.
Software Dependencies No No specific software versions (e.g., library names with version numbers like PyTorch 1.9) were provided.
Experiment Setup Yes All methods use the smallest pre-trained GPT-2 model (Radford et al., 2019)1 as the LM. Each task is trained for nine epochs; greedy decoding is applied during inference. LAMOL In all experiments, k = 20 in top-k sampling and λ = 0.25 for weight of the LM loss are set.