reproducibilityindex.ai

High-Performance Large-Scale Image Recognition Without Normalization

Authors: Andy Brock, Soham De, Samuel L Smith, Karen Simonyan

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we develop an adaptive gradient clipping technique which overcomes these instabilities, and design a signiﬁcantly improved class of Normalizer-Free Res Nets. Our smaller models match the test accuracy of an Efﬁcient Net-B7 on Image Net while being up to 8.7 faster to train, and our largest models attain a new state-of-the-art top-1 accuracy of 86.5%.
Researcher Affiliation	Industry	1Deep Mind, London, United Kingdom. Correspondence to: Andrew Brock <ajbrock@deepmind.com>.
Pseudocode	No	The paper describes algorithms using mathematical formulas but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code and pretrained models are available at https://github.com/ deepmind/deepmind-research/tree/ master/nfnets
Open Datasets	Yes	We now turn our attention to evaluating our NFNet models on Image Net... (Russakovsky et al., 2015) ...object detection on COCO (Lin et al., 2014).
Dataset Splits	Yes	Figure 1. Image Net Validation Accuracy vs Training Latency.
Hardware Specification	Yes	Latencies are given as the time in milliseconds required to perform a single full training step on TPU or GPU (V100).
Software Dependencies	No	The paper mentions software components like JAX, Haiku, and NumPy, but it does not provide specific version numbers for these dependencies, which are necessary for full reproducibility.
Experiment Setup	Yes	We performed experiments on pre-activation NF-Res Net-50 and NF-Res Net-200 on Image Net, trained using SGD with Nesterov s Momentum for 90 epochs at a range of batch sizes between 256 and 4096. As in Goyal et al. (2017) we use a base learning rate of 0.1 for batch size 256, which is scaled linearly with the batch size. We now turn our attention to evaluating our NFNet models on Image Net, beginning with an ablation of our architectural modiﬁcations when training for 360 epochs at batch size 4096. We use Nesterov s Momentum with a momentum coefﬁcient of 0.9, AGC as described in Section 4 with a clipping threshold of 0.01, and a learning rate which linearly increases from 0 to 1.6 over 5 epochs, before decaying to zero with cosine annealing (Loshchilov & Hutter, 2017).