EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
Authors: Mingxing Tan, Quoc Le
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical study shows that it is critical to balance all dimensions of network width/depth/resolution, and surprisingly such balance can be achieved by simply scaling each of them with constant ratio. We demonstrate that our scaling method work well on existing Mobile Nets (Howard et al., 2017; Sandler et al., 2018) and Res Net (He et al., 2016). Notably, the effectiveness of model scaling heavily depends on the baseline network; to go even further, we use neural architecture search (Zoph & Le, 2017; Tan et al., 2019) to develop a new baseline network, and scale it up to obtain a family of models, called Efficient Nets. Figure 1 summarizes the Image Net performance, where our Efficient Nets significantly outperform other Conv Nets. |
| Researcher Affiliation | Industry | 1Google Research, Brain Team, Mountain View, CA. |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statement or link indicating that the source code for the methodology described is publicly available. |
| Open Datasets | Yes | We train our Efficient Net models on Image Net using similar settings as (Tan et al., 2019). Image Net (Russakovsky et al., 2015). |
| Dataset Splits | Yes | Image Net top-1 validation accuracy to 84.3%. Images are randomly picked from Image Net validation set. |
| Hardware Specification | Yes | Latency is measured with batch size 1 on a single core of Intel Xeon CPU E5-2690. |
| Software Dependencies | No | The paper mentions software components like "RMSProp optimizer," "batch norm," "swish activation," "Auto Augment," and "stochastic depth," but it does not specify concrete version numbers for any of these or other software libraries (e.g., Python, TensorFlow, PyTorch versions). |
| Experiment Setup | Yes | We train our Efficient Net models on Image Net using similar settings as (Tan et al., 2019): RMSProp optimizer with decay 0.9 and momentum 0.9; batch norm momentum 0.99; weight decay 1e-5; initial learning rate 0.256 that decays by 0.97 every 2.4 epochs. We also use swish activation (Ramachandran et al., 2018; Elfwing et al., 2018), fixed Auto Augment policy (Cubuk et al., 2019), and stochastic depth (Huang et al., 2016) with drop connect ratio 0.3. As commonly known that bigger models need more regularization, we linearly increase dropout (Srivastava et al., 2014) ratio from 0.2 for Efficient Net-B0 to 0.5 for Efficient Net-B7. |