EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

Authors: Mingxing Tan, Quoc Le

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical study shows that it is critical to balance all dimensions of network width/depth/resolution, and surprisingly such balance can be achieved by simply scaling each of them with constant ratio. We demonstrate that our scaling method work well on existing Mobile Nets (Howard et al., 2017; Sandler et al., 2018) and Res Net (He et al., 2016). Notably, the effectiveness of model scaling heavily depends on the baseline network; to go even further, we use neural architecture search (Zoph & Le, 2017; Tan et al., 2019) to develop a new baseline network, and scale it up to obtain a family of models, called Efficient Nets. Figure 1 summarizes the Image Net performance, where our Efficient Nets significantly outperform other Conv Nets.
Researcher Affiliation Industry 1Google Research, Brain Team, Mountain View, CA.
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statement or link indicating that the source code for the methodology described is publicly available.
Open Datasets Yes We train our Efficient Net models on Image Net using similar settings as (Tan et al., 2019). Image Net (Russakovsky et al., 2015).
Dataset Splits Yes Image Net top-1 validation accuracy to 84.3%. Images are randomly picked from Image Net validation set.
Hardware Specification Yes Latency is measured with batch size 1 on a single core of Intel Xeon CPU E5-2690.
Software Dependencies No The paper mentions software components like "RMSProp optimizer," "batch norm," "swish activation," "Auto Augment," and "stochastic depth," but it does not specify concrete version numbers for any of these or other software libraries (e.g., Python, TensorFlow, PyTorch versions).
Experiment Setup Yes We train our Efficient Net models on Image Net using similar settings as (Tan et al., 2019): RMSProp optimizer with decay 0.9 and momentum 0.9; batch norm momentum 0.99; weight decay 1e-5; initial learning rate 0.256 that decays by 0.97 every 2.4 epochs. We also use swish activation (Ramachandran et al., 2018; Elfwing et al., 2018), fixed Auto Augment policy (Cubuk et al., 2019), and stochastic depth (Huang et al., 2016) with drop connect ratio 0.3. As commonly known that bigger models need more regularization, we linearly increase dropout (Srivastava et al., 2014) ratio from 0.2 for Efficient Net-B0 to 0.5 for Efficient Net-B7.