LAMOL: LAnguage MOdeling for Lifelong Language Learning
Authors: Fan-Keng Sun*, Cheng-Hao Ho*, Hung-Yi Lee
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The results show that LAMOL prevents catastrophic forgetting without any sign of intransigence and can perform five very different language tasks sequentially with only one model. Overall, LAMOL outperforms previous methods by a considerable margin and is only 2 3% worse than multitasking, which is usually considered the LLL upper bound. |
| Researcher Affiliation | Academia | Fan-Keng Sun MIT Cambridge, MA, USA fankeng@mit.edu Cheng-Hao Ho National Taiwan University Taipei, Taiwan jojotenya@gmail.com Hung-Yi Lee National Taiwan University Taipei, Taiwan hungyilee@ntu.edu.tw |
| Pseudocode | No | No pseudocode or clearly labeled algorithm block was found in the paper. |
| Open Source Code | Yes | The source code is available at https://github.com/jojotenya/LAMOL. |
| Open Datasets | Yes | Question Answering Stanford Question Answering Dataset (SQu AD) (Rajpurkar et al., 2016)... Semantic Parsing Wiki SQL (Zhong et al., 2017)... Sentiment Analysis Stanford Sentiment Treebank (SST, binary version) (Radford et al., 2017)... Semantic Role Labeling QA-SRL (He et al., 2017)... Goal-Oriented Dialogue English Wizard of Oz (WOZ) (Wen et al., 2016)... The dataset collected by Xiang Zhang (2015) is available at http://goo.gl/Jy Cn Zq. |
| Dataset Splits | No | Note that this work uses no development set, only the training and test datasets are shown. |
| Hardware Specification | No | No specific hardware details (like GPU/CPU models, memory) were provided for the experiments. |
| Software Dependencies | No | No specific software versions (e.g., library names with version numbers like PyTorch 1.9) were provided. |
| Experiment Setup | Yes | All methods use the smallest pre-trained GPT-2 model (Radford et al., 2019)1 as the LM. Each task is trained for nine epochs; greedy decoding is applied during inference. LAMOL In all experiments, k = 20 in top-k sampling and λ = 0.25 for weight of the LM loss are set. |