reproducibilityindex.ai

Fixup Initialization: Residual Learning Without Normalization

Authors: Hongyi Zhang, Yann N. Dauphin, Tengyu Ma

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We apply Fixup to replace batch normalization on image classiﬁcation benchmarks CIFAR-10 (with Wide-Res Net) and Image Net (with Res Net), and ﬁnd Fixup with proper regularization matches the well-tuned baseline trained with normalization. (Section 4.2) Machine translation. We apply Fixup to replace layer normalization on machine translation benchmarks IWSLT and WMT using the Transformer model, and ﬁnd it outperforms the baseline and achieves new state-of-the-art results on the same architecture. (Section 4.3)
Researcher Affiliation	Collaboration	Hongyi Zhang MIT hongyiz@mit.edu Yann N. Dauphin Google Brain yann@dauphin.io Tengyu Ma Stanford University tengyuma@stanford.edu Work done at Facebook. Equal contribution. Work done at Facebook. Equal contribution. Work done at Facebook.
Pseudocode	No	The 'Fixup initialization' steps are presented as a numbered list within a paragraph, not in a clearly labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code	No	No explicit statement about releasing source code or a link to a code repository for the methodology described in the paper.
Open Datasets	Yes	We apply Fixup to replace batch normalization on image classiﬁcation benchmarks CIFAR-10 (with Wide-Res Net) and Image Net (with Res Net)...
Dataset Splits	Yes	Best Mixup coefﬁcients are found through cross-validation: they are 0.2, 0.1 and 0.7 for Batch Norm, Group Norm (Wu & He, 2018) and Fixup respectively.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) are provided.
Software Dependencies	No	Speciﬁcally, we use the fairseq library (Gehring et al., 2017) and follow the Fixup template in Section 3 to modify the baseline model. (No version specified for fairseq or other libraries).
Experiment Setup	Yes	We use the default batch size of 128 up to 1000 layers, with a batch size of 64 for 10,000 layers.