Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

LAMOL: LAnguage MOdeling for Lifelong Language Learning

Authors: Fan-Keng Sun*, Cheng-Hao Ho*, Hung-Yi Lee

ICLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The results show that LAMOL prevents catastrophic forgetting without any sign of intransigence and can perform five very different language tasks sequentially with only one model. Overall, LAMOL outperforms previous methods by a considerable margin and is only 2 3% worse than multitasking, which is usually considered the LLL upper bound.
Researcher Affiliation Academia Fan-Keng Sun MIT Cambridge, MA, USA EMAIL Cheng-Hao Ho National Taiwan University Taipei, Taiwan EMAIL Hung-Yi Lee National Taiwan University Taipei, Taiwan EMAIL
Pseudocode No No pseudocode or clearly labeled algorithm block was found in the paper.
Open Source Code Yes The source code is available at https://github.com/jojotenya/LAMOL.
Open Datasets Yes Question Answering Stanford Question Answering Dataset (SQu AD) (Rajpurkar et al., 2016)... Semantic Parsing Wiki SQL (Zhong et al., 2017)... Sentiment Analysis Stanford Sentiment Treebank (SST, binary version) (Radford et al., 2017)... Semantic Role Labeling QA-SRL (He et al., 2017)... Goal-Oriented Dialogue English Wizard of Oz (WOZ) (Wen et al., 2016)... The dataset collected by Xiang Zhang (2015) is available at http://goo.gl/Jy Cn Zq.
Dataset Splits No Note that this work uses no development set, only the training and test datasets are shown.
Hardware Specification No No specific hardware details (like GPU/CPU models, memory) were provided for the experiments.
Software Dependencies No No specific software versions (e.g., library names with version numbers like PyTorch 1.9) were provided.
Experiment Setup Yes All methods use the smallest pre-trained GPT-2 model (Radford et al., 2019)1 as the LM. Each task is trained for nine epochs; greedy decoding is applied during inference. LAMOL In all experiments, k = 20 in top-k sampling and λ = 0.25 for weight of the LM loss are set.