Compositional Language Continual Learning
Authors: Yuanpeng Li, Liang Zhao, Kenneth Church, Mohamed Elhoseiny
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that the proposed method has significant improvement over state-of-the-art methods. It enables knowledge transfer and prevents catastrophic forgetting, resulting in more than 85% accuracy up to 100 stages, compared with less than 50% accuracy for baselines in instruction learning task. It also shows significant improvement in machine translation task. |
| Researcher Affiliation | Collaboration | Yuanpeng Li , Liang Zhao, Kenneth Church Baidu Research Mohamed Elhoseiny KAUST, Stanford University |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available at: https://github.com/yli1/CLCL. |
| Open Datasets | Yes | We extend the grammar in SCAN dataset (Lake & Baroni, 2017) to generate data. Machine translation dataset is generated similarly from the translation dataset in SCAN dataset (Lake & Baroni, 2017). |
| Dataset Splits | Yes | We use one set for initial stage training data (6,601 samples), and reserve the other set as initial dataset to evaluate catastrophic forgetting in continual stages (Forget, 6,602 samples). The reserved data is also used to evaluate long-term catastrophic forgetting (Long-forget). We then add Transfer to Forget for the next stage. Machine translation dataset is generated similarly from the translation dataset in SCAN dataset (Lake & Baroni, 2017). We use the original training data as the initial training data, and the original test data as the initial test data. |
| Hardware Specification | No | The paper mentions implementing methods with TensorFlow but does not specify any hardware details such as GPU models, CPU types, or memory. |
| Software Dependencies | No | The paper mentions using 'TensorFlow (Abadi et al., 2016)' but does not provide a specific version number for TensorFlow or any other software dependencies. |
| Experiment Setup | Yes | The state size is h = 32 for encoder, and 2h = 64 for decoder. We also use kp = 64, kf = 8, α = 0.1. For EWC (Kirkpatrick et al., 2017a) and MAS (Aljundi et al., 2018), we use 10 for parameter regularization weight. In initial stage, batch size is 512, and we run training 5,000 steps. In each continual stage, batch size is 1, as each continual stage only contains one sample, and we run training 1,000 steps. We have 100 continual stages. |