Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Root Mean Square Layer Normalization
Authors: Biao Zhang, Rico Sennrich
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on several tasks using diverse network architectures show that RMSNorm achieves comparable performance against Layer Norm but reduces the running time by 7% 64% on different models. |
| Researcher Affiliation | Academia | Biao Zhang1 Rico Sennrich2,1 1School of Informatics, University of Edinburgh 2Institute of Computational Linguistics, University of Zurich EMAIL, EMAIL |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Source code is available at https://github.com/bzhang Go/rmsnorm. |
| Open Datasets | Yes | We train two different models, a GRU-based RNNSearch [4] and a self-attention based neural Transformer [31] on WMT14 English-German translation task. We train an order-embedding model (OE) proposed by Vendrov et al. [32] on the Microsoft COCO dataset [17] using their public source code in Theano. CIFAR-10 is a supervised image classification task, with 10 different classes. |
| Dataset Splits | Yes | We train two different models... on WMT14 English-German translation task. We use the newstest2013 dataset. We train an order-embedding model... on the Microsoft COCO dataset [17]. We train a modified version of the Conv Pool-CNN-C architecture [15], and follow the same experimental protocol as Salimans and Kingma [22]. |
| Hardware Specification | Yes | Unless otherwise noted, all speed-related statistics are measured on one TITAN X (Pascal). Time : the time in second per 1k training steps, which is measured using Tesla V100. Time is measured with Ge Force RTX 2080 Ti. |
| Software Dependencies | No | The paper mentions using Tensorflow, Py Torch, and Theano but does not specify their version numbers. |
| Experiment Setup | No | The paper references external papers for experimental protocols (e.g., 'employ the base setting as in [31]', 'follow the same experimental protocol as Salimans and Kingma [22]') but does not explicitly list concrete hyperparameter values or detailed training configurations within its own main text. |