reproducibilityindex.ai

EfficientNetV2: Smaller Models and Faster Training

Authors: Mingxing Tan, Quoc Le

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that Efﬁcient Net V2 models train much faster than state-of-the-art models while being up to 6.8x smaller. With progressive learning, our Efﬁcient Net V2 signiﬁcantly outperforms previous models on Image Net and CIFAR/Cars/Flowers datasets.
Researcher Affiliation	Industry	1Google Research, Brain Team. Correspondence to: Mingxing Tan <tanmingxing@google.com>.
Pseudocode	Yes	Algorithm 1 Progressive learning with adaptive regularization.
Open Source Code	Yes	Code is available at https://github.com/google/ automl/tree/master/efficientnetv2.
Open Datasets	Yes	Image Net ILSVRC2012 (Russakovsky et al., 2015) contains about 1.28M training images and 50,000 validation images with 1000 classes. Image Net21k (Russakovsky et al., 2015) contains about 13M training images with 21,841 classes. We evaluate our models on four transfer learning datasets: CIFAR-10, CIFAR-100, Flowers and Cars. Table 9 includes the statistics of these datasets.
Dataset Splits	Yes	Image Net ILSVRC2012 (Russakovsky et al., 2015) contains about 1.28M training images and 50,000 validation images with 1000 classes. During architecture search or hyperparameter tuning, we reserve 25,000 images (about 2%) from the training set as minival for accuracy evaluation.
Hardware Specification	Yes	Training time is measured with 32 TPU cores. All Efﬁcient Net V2 models are trained with progressive learning. Our Efﬁcient Net V2 and progressive learning also make it easier to train models on larger datasets. For example, Image Net21k (Russakovsky et al., 2015) is about 10x larger than Image Net ILSVRC2012, but our Efﬁcient Net V2 can ﬁnish the training within two days using moderate computing resources of 32 TPUv3 cores. Infer-time is measured on V100 GPU FP16 with batch size 16 using the same codebase (Wightman, 2021); Train-time is the total training time normalized for 32 TPU cores.
Software Dependencies	No	The paper mentions using 'RMSProp optimizer', 'batch norm', 'Rand Augment', 'Mixup', 'Dropout', 'stochastic depth', and the 'Py Torch Image Models codebase (Wightman, 2021)'. However, it does not provide specific version numbers for PyTorch or other software libraries.
Experiment Setup	Yes	Our Image Net training settings largely follow Efﬁcient Nets (Tan & Le, 2019a): RMSProp optimizer with decay 0.9 and momentum 0.9; batch norm momentum 0.99; weight decay 1e-5. Each model is trained for 350 epochs with total batch size 4096. Learning rate is ﬁrst warmed up from 0 to 0.256, and then decayed by 0.97 every 2.4 epochs. We use exponential moving average with 0.9999 decay rate, Rand Augment (Cubuk et al., 2020), Mixup (Zhang et al., 2018), Dropout (Srivastava et al., 2014), and stochastic depth (Huang et al., 2016) with 0.8 survival probability. For progressive learning, we divide the training process into four stages with about 87 epochs per stage: the early stage uses a small image size with weak regularization, while the later stages use larger image sizes with stronger regularization, as described in Algorithm 1. Table 6 shows the minimum (for the ﬁrst stage) and maximum (for the last stage) values of image size and regularization.