High-Performance Large-Scale Image Recognition Without Normalization
Authors: Andy Brock, Soham De, Samuel L Smith, Karen Simonyan
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we develop an adaptive gradient clipping technique which overcomes these instabilities, and design a significantly improved class of Normalizer-Free Res Nets. Our smaller models match the test accuracy of an Efficient Net-B7 on Image Net while being up to 8.7 faster to train, and our largest models attain a new state-of-the-art top-1 accuracy of 86.5%. |
| Researcher Affiliation | Industry | 1Deep Mind, London, United Kingdom. Correspondence to: Andrew Brock <ajbrock@deepmind.com>. |
| Pseudocode | No | The paper describes algorithms using mathematical formulas but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code and pretrained models are available at https://github.com/ deepmind/deepmind-research/tree/ master/nfnets |
| Open Datasets | Yes | We now turn our attention to evaluating our NFNet models on Image Net... (Russakovsky et al., 2015) ...object detection on COCO (Lin et al., 2014). |
| Dataset Splits | Yes | Figure 1. Image Net Validation Accuracy vs Training Latency. |
| Hardware Specification | Yes | Latencies are given as the time in milliseconds required to perform a single full training step on TPU or GPU (V100). |
| Software Dependencies | No | The paper mentions software components like JAX, Haiku, and NumPy, but it does not provide specific version numbers for these dependencies, which are necessary for full reproducibility. |
| Experiment Setup | Yes | We performed experiments on pre-activation NF-Res Net-50 and NF-Res Net-200 on Image Net, trained using SGD with Nesterov s Momentum for 90 epochs at a range of batch sizes between 256 and 4096. As in Goyal et al. (2017) we use a base learning rate of 0.1 for batch size 256, which is scaled linearly with the batch size. We now turn our attention to evaluating our NFNet models on Image Net, beginning with an ablation of our architectural modifications when training for 360 epochs at batch size 4096. We use Nesterov s Momentum with a momentum coefficient of 0.9, AGC as described in Section 4 with a clipping threshold of 0.01, and a learning rate which linearly increases from 0 to 1.6 over 5 epochs, before decaying to zero with cosine annealing (Loshchilov & Hutter, 2017). |