reproducibilityindex.ai

On the Nonlinearity of Layer Normalization

Authors: Yunhao Ni, Yuxin Guo, Junlong Jia, Lei Huang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experiments are conducted on CIFAR-10 and MNIST with random label assigned (CIFAR-10-RL and MNIST-RL). We evaluate the classification accuracy on the training set after the model is trained, which indicates that the capacity of models in fitting dataset empirically. We only provide essential components of the experimental setup; for more details, please refer to the Appendix F.1.
Researcher Affiliation	Academia	1SKLCCSE, Institute of Artificial Intelligence, Beihang University, Beijing, China.
Pseudocode	Yes	Algorithm 1 Projection Merge Algorithm
Open Source Code	No	No explicit statement or link to open-source code for the methodology described in this paper was found.
Open Datasets	Yes	The experiments are conducted on CIFAR-10 and MNIST with random label assigned (CIFAR-10-RL and MNIST-RL).
Dataset Splits	No	The experiments are conducted on CIFAR-10 and MNIST with random label assigned (CIFAR-10-RL and MNIST-RL). We evaluate the classification accuracy on the training set after the model is trained, which indicates that the capacity of models in fitting dataset empirically.
Hardware Specification	No	We only provide essential components of the experimental setup; for more details, please refer to the Appendix F.1.
Software Dependencies	No	We conduct experiments to apply LN-G on Transformer (Vaswani et al., 2017) (where LN is the default normalization) for machine translation tasks using fairseq-py (Ott et al., 2019).
Experiment Setup	Yes	For the training of liner classifier, we apply both SGD optimizer with momentum (0.1) and Adam optimizer with betas (0.9, 0.999). We train the model for 150 epochs and use a learning rate schedule with a decay 0.5 per 20 epochs. We search the batch sizes ranging in {128, 256}, the initial learning rates ranging in {0.001, 0.003, 0.005, 0.008, 0.05, 0.08, 0.1, 0.15} and 5 random seeds, and report the best accuracy from these configurations of hyper-parameters.