Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
LAMOL: LAnguage MOdeling for Lifelong Language Learning
Authors: Fan-Keng Sun*, Cheng-Hao Ho*, Hung-Yi Lee
ICLR 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The results show that LAMOL prevents catastrophic forgetting without any sign of intransigence and can perform five very different language tasks sequentially with only one model. Overall, LAMOL outperforms previous methods by a considerable margin and is only 2 3% worse than multitasking, which is usually considered the LLL upper bound. |
| Researcher Affiliation | Academia | Fan-Keng Sun MIT Cambridge, MA, USA EMAIL Cheng-Hao Ho National Taiwan University Taipei, Taiwan EMAIL Hung-Yi Lee National Taiwan University Taipei, Taiwan EMAIL |
| Pseudocode | No | No pseudocode or clearly labeled algorithm block was found in the paper. |
| Open Source Code | Yes | The source code is available at https://github.com/jojotenya/LAMOL. |
| Open Datasets | Yes | Question Answering Stanford Question Answering Dataset (SQu AD) (Rajpurkar et al., 2016)... Semantic Parsing Wiki SQL (Zhong et al., 2017)... Sentiment Analysis Stanford Sentiment Treebank (SST, binary version) (Radford et al., 2017)... Semantic Role Labeling QA-SRL (He et al., 2017)... Goal-Oriented Dialogue English Wizard of Oz (WOZ) (Wen et al., 2016)... The dataset collected by Xiang Zhang (2015) is available at http://goo.gl/Jy Cn Zq. |
| Dataset Splits | No | Note that this work uses no development set, only the training and test datasets are shown. |
| Hardware Specification | No | No specific hardware details (like GPU/CPU models, memory) were provided for the experiments. |
| Software Dependencies | No | No specific software versions (e.g., library names with version numbers like PyTorch 1.9) were provided. |
| Experiment Setup | Yes | All methods use the smallest pre-trained GPT-2 model (Radford et al., 2019)1 as the LM. Each task is trained for nine epochs; greedy decoding is applied during inference. LAMOL In all experiments, k = 20 in top-k sampling and λ = 0.25 for weight of the LM loss are set. |