reproducibilityindex.ai

Transformer with Memory Replay

Authors: Rui Liu, Barzan Mozafari7567-7575

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on GLUE and SQu AD benchmark datasets show that Transformer with Memory Replay achieves at least 1% point increase compared to the baseline transformer model when pretrained with the same number of examples.
Researcher Affiliation	Academia	Rui Liu, Barzan Mozafari Computer Science and Engineering, University of Michigan, Ann Arbor {ruixliu, mozafari}@umich.edu
Pseudocode	No	The paper refers to an algorithm from another paper but does not provide pseudocode or an algorithm block within its own content.
Open Source Code	No	The paper does not provide an explicit statement or link to its open-source code.
Open Datasets	Yes	We pre-train our model with two different sizes: a small model and a base model on English Wikipedia. We use two commonly used datasets as the benchmark to evaluate performance: General Language Understanding Evaluation (GLUE) (Wang et al. 2018) and Stanford Question Answering Dataset (SQu AD) (Rajpurkar et al. 2016).
Dataset Splits	Yes	We use two commonly used datasets as the benchmark to evaluate performance: General Language Understanding Evaluation (GLUE) (Wang et al. 2018) and Stanford Question Answering Dataset (SQu AD) (Rajpurkar et al. 2016). Unless stated otherwise, results are on the dev set.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	The paper mentions using 'Adam with warmup' for pre-training but does not provide specific software dependencies with version numbers (e.g., PyTorch version, TensorFlow version, or other library versions).
Experiment Setup	Yes	We use Adam with warmup to pre-train the models. The detailed setup is the same as Clark et al. (2020) if not stated otherwise. Speciﬁcally, we set ϵ = 1e 6, β1 = 0.9 and β2 = 0.999. The mini-batch size is 128 for the small model and 256 for the base model. The memory buffer size N is set to 1k in our experiments.