Episodic Memory in Lifelong Language Learning
Authors: Cyprien de Masson d'Autume, Sebastian Ruder, Lingpeng Kong, Dani Yogatama
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on text classification and question answering demonstrate the complementary benefits of sparse experience replay and local adaptation to allow the model to continuously learn from new datasets. We evaluate our proposed model against several baselines on text classification and question answering tasks. Table 1 provides a summary of our main results. |
| Researcher Affiliation | Industry | Cyprien de Masson d Autume, Sebastian Ruder, Lingpeng Kong, Dani Yogatama Deep Mind London, United Kingdom {cyprien,ruder,lingpenk,dyogatama}@google.com |
| Pseudocode | Yes | Algorithm 1 Training. Algorithm 2 Inference. |
| Open Source Code | No | The paper mentions 'https://github.com/google-research/bert' but this refers to the BERT model that the authors used, not the source code for their specific methodology described in the paper. |
| Open Datasets | Yes | We use publicly available text classification datasets from Zhang et al. (2015) to evaluate our models (http://goo.gl/Jy Cn Zq). We use three question answering datasets: SQu AD 1.1 (Rajpurkar et al., 2016), Trivia QA (Joshi et al., 2017), and Qu AC (Choi et al., 2018). |
| Dataset Splits | Yes | We create a balanced version all datasets used in our experiments by randomly sampling 115,000 training examples and 7,600 test examples from all datasets (i.e., the size of the smallest training and test sets). In total, we have 575,000 training examples and 38,000 test examples. SQu AD... It includes almost 90,000 training examples and 10,000 validation examples. Trivia QA... 76,000 training examples and 10,000 (unverified) validation examples, whereas the Wikipedia section has about 60,000 training examples and 8,000 validation examples. Qu AC... 80,000 training examples and approximately 7,000 validation examples. |
| Hardware Specification | Yes | For each experiment, we use 4 Intel Skylake x86-64 CPUs at 2 GHz, 1 Nvidia Tesla V100 GPU, and 20 GB of RAM. |
| Software Dependencies | No | The paper mentions using a 'pretrained BERTBASE model' and 'Adam' as an optimizer, but it does not specify version numbers for any software libraries or frameworks (e.g., Python, TensorFlow, PyTorch) that would be needed for replication. |
| Experiment Setup | Yes | We use Adam (Kingma & Ba, 2015) as our optimizer. We set dropout (Srivastava et al., 2014) to 0.1 and λ in Eq. 1 to 0.001. We set the base learning rate to 3e 5 (based on preliminary experiments, in line with the suggested learning rate for using BERT). For text classification, we use a training batch of size 32. For question answering, the batch size is 8. The only hyperparameter that we tune is the local adaptation learning rate {5e 3, 1e 3}. We set the number of neighbors K = 32 and the number of local adaptation steps L = 30. |