reproducibilityindex.ai

Root Mean Square Layer Normalization

Authors: Biao Zhang, Rico Sennrich

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on several tasks using diverse network architectures show that RMSNorm achieves comparable performance against Layer Norm but reduces the running time by 7% 64% on different models.
Researcher Affiliation	Academia	Biao Zhang1 Rico Sennrich2,1 1School of Informatics, University of Edinburgh 2Institute of Computational Linguistics, University of Zurich B.Zhang@ed.ac.uk, sennrich@cl.uzh.ch
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Source code is available at https://github.com/bzhang Go/rmsnorm.
Open Datasets	Yes	We train two different models, a GRU-based RNNSearch [4] and a self-attention based neural Transformer [31] on WMT14 English-German translation task. We train an order-embedding model (OE) proposed by Vendrov et al. [32] on the Microsoft COCO dataset [17] using their public source code in Theano. CIFAR-10 is a supervised image classiﬁcation task, with 10 different classes.
Dataset Splits	Yes	We train two different models... on WMT14 English-German translation task. We use the newstest2013 dataset. We train an order-embedding model... on the Microsoft COCO dataset [17]. We train a modiﬁed version of the Conv Pool-CNN-C architecture [15], and follow the same experimental protocol as Salimans and Kingma [22].
Hardware Specification	Yes	Unless otherwise noted, all speed-related statistics are measured on one TITAN X (Pascal). Time : the time in second per 1k training steps, which is measured using Tesla V100. Time is measured with Ge Force RTX 2080 Ti.
Software Dependencies	No	The paper mentions using Tensorﬂow, Py Torch, and Theano but does not specify their version numbers.
Experiment Setup	No	The paper references external papers for experimental protocols (e.g., 'employ the base setting as in [31]', 'follow the same experimental protocol as Salimans and Kingma [22]') but does not explicitly list concrete hyperparameter values or detailed training configurations within its own main text.