reproducibilityindex.ai

Episodic Memory in Lifelong Language Learning

Authors: Cyprien de Masson d'Autume, Sebastian Ruder, Lingpeng Kong, Dani Yogatama

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on text classiﬁcation and question answering demonstrate the complementary beneﬁts of sparse experience replay and local adaptation to allow the model to continuously learn from new datasets. We evaluate our proposed model against several baselines on text classiﬁcation and question answering tasks. Table 1 provides a summary of our main results.
Researcher Affiliation	Industry	Cyprien de Masson d Autume, Sebastian Ruder, Lingpeng Kong, Dani Yogatama Deep Mind London, United Kingdom {cyprien,ruder,lingpenk,dyogatama}@google.com
Pseudocode	Yes	Algorithm 1 Training. Algorithm 2 Inference.
Open Source Code	No	The paper mentions 'https://github.com/google-research/bert' but this refers to the BERT model that the authors used, not the source code for their specific methodology described in the paper.
Open Datasets	Yes	We use publicly available text classiﬁcation datasets from Zhang et al. (2015) to evaluate our models (http://goo.gl/Jy Cn Zq). We use three question answering datasets: SQu AD 1.1 (Rajpurkar et al., 2016), Trivia QA (Joshi et al., 2017), and Qu AC (Choi et al., 2018).
Dataset Splits	Yes	We create a balanced version all datasets used in our experiments by randomly sampling 115,000 training examples and 7,600 test examples from all datasets (i.e., the size of the smallest training and test sets). In total, we have 575,000 training examples and 38,000 test examples. SQu AD... It includes almost 90,000 training examples and 10,000 validation examples. Trivia QA... 76,000 training examples and 10,000 (unveriﬁed) validation examples, whereas the Wikipedia section has about 60,000 training examples and 8,000 validation examples. Qu AC... 80,000 training examples and approximately 7,000 validation examples.
Hardware Specification	Yes	For each experiment, we use 4 Intel Skylake x86-64 CPUs at 2 GHz, 1 Nvidia Tesla V100 GPU, and 20 GB of RAM.
Software Dependencies	No	The paper mentions using a 'pretrained BERTBASE model' and 'Adam' as an optimizer, but it does not specify version numbers for any software libraries or frameworks (e.g., Python, TensorFlow, PyTorch) that would be needed for replication.
Experiment Setup	Yes	We use Adam (Kingma & Ba, 2015) as our optimizer. We set dropout (Srivastava et al., 2014) to 0.1 and λ in Eq. 1 to 0.001. We set the base learning rate to 3e 5 (based on preliminary experiments, in line with the suggested learning rate for using BERT). For text classiﬁcation, we use a training batch of size 32. For question answering, the batch size is 8. The only hyperparameter that we tune is the local adaptation learning rate {5e 3, 1e 3}. We set the number of neighbors K = 32 and the number of local adaptation steps L = 30.