Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Transformer with Memory Replay
Authors: Rui Liu, Barzan Mozafari7567-7575
AAAI 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on GLUE and SQu AD benchmark datasets show that Transformer with Memory Replay achieves at least 1% point increase compared to the baseline transformer model when pretrained with the same number of examples. |
| Researcher Affiliation | Academia | Rui Liu, Barzan Mozafari Computer Science and Engineering, University of Michigan, Ann Arbor EMAIL |
| Pseudocode | No | The paper refers to an algorithm from another paper but does not provide pseudocode or an algorithm block within its own content. |
| Open Source Code | No | The paper does not provide an explicit statement or link to its open-source code. |
| Open Datasets | Yes | We pre-train our model with two different sizes: a small model and a base model on English Wikipedia. We use two commonly used datasets as the benchmark to evaluate performance: General Language Understanding Evaluation (GLUE) (Wang et al. 2018) and Stanford Question Answering Dataset (SQu AD) (Rajpurkar et al. 2016). |
| Dataset Splits | Yes | We use two commonly used datasets as the benchmark to evaluate performance: General Language Understanding Evaluation (GLUE) (Wang et al. 2018) and Stanford Question Answering Dataset (SQu AD) (Rajpurkar et al. 2016). Unless stated otherwise, results are on the dev set. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'Adam with warmup' for pre-training but does not provide specific software dependencies with version numbers (e.g., PyTorch version, TensorFlow version, or other library versions). |
| Experiment Setup | Yes | We use Adam with warmup to pre-train the models. The detailed setup is the same as Clark et al. (2020) if not stated otherwise. Specifically, we set ϵ = 1e 6, β1 = 0.9 and β2 = 0.999. The mini-batch size is 128 for the small model and 256 for the base model. The memory buffer size N is set to 1k in our experiments. |