DropBlock: A regularization method for convolutional networks

Authors: Golnaz Ghiasi, Tsung-Yi Lin, Quoc V. Le

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that Drop Block works better than dropout in regularizing convolutional networks. On Image Net classification, Res Net-50 architecture with Drop Block achieves 78.13% accuracy, which is more than 1.6% improvement on the baseline. On COCO detection, Drop Block improves Average Precision of Retina Net from 36.8% to 38.4%.
Researcher Affiliation Industry Golnaz Ghiasi Google Brain Tsung-Yi Lin Google Brain Quoc V. Le Google Brain
Pseudocode Yes Algorithm 1 Drop Block
Open Source Code Yes The code of these results is in https://github.com/tensorflow/tpu/tree/master/models/ official/resnet. 1https://github.com/tensorflow/tpu/tree/master/models/official/resnet 2https://github.com/tensorflow/tpu/tree/master/models/experimental/amoeba_net 3https://github.com/tensorflow/tpu/tree/master/models/official/retinanet
Open Datasets Yes The ILSVRC 2012 classification dataset [25] contains 1.2 million training images, 50,000 validation images, and 150,000 testing images. Images are labeled with 1,000 categories. COCO dataset [30]. PASCAL VOC 2012 dataset.
Dataset Splits Yes The ILSVRC 2012 classification dataset [25] contains 1.2 million training images, 50,000 validation images, and 150,000 testing images. Following the common practice, we report classification accuracy on the validation set.
Hardware Specification Yes We trained models on Tensor Processing Units (TPUs). The models were trained on TPU with 64 images in a batch.
Software Dependencies No The paper mentions using "TensorFlow implementations" but does not specify a version number for TensorFlow or any other software dependencies.
Experiment Setup Yes We used the default image size (224 224 for Res Net-50 and 331 331 for Amoeba Net), batch size (1024 for Res Net-50 and 2048 for Amoeba Net) and hyperparameters setting for all the models. We only increased number of training epochs from 90 to 300 for Res Net-50 architecture. The learning rate was decayed by the factor of 0.1 at 100, 200 and 265 epochs. Amoeba Net models were trained for 340 epochs and exponential decay scheme was used for scheduling learning rate. The model was trained using 150 epochs (280k training steps). The initial learning rate 0.08 was applied for first 120 epochs and decayed 0.1 at 120 and 140 epochs. We used α = 0.25 and γ = 1.5 for focal loss. We used a weight decay of 0.0001 and a momentum of 0.9.