DropBlock: A regularization method for convolutional networks
Authors: Golnaz Ghiasi, Tsung-Yi Lin, Quoc V. Le
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that Drop Block works better than dropout in regularizing convolutional networks. On Image Net classification, Res Net-50 architecture with Drop Block achieves 78.13% accuracy, which is more than 1.6% improvement on the baseline. On COCO detection, Drop Block improves Average Precision of Retina Net from 36.8% to 38.4%. |
| Researcher Affiliation | Industry | Golnaz Ghiasi Google Brain Tsung-Yi Lin Google Brain Quoc V. Le Google Brain |
| Pseudocode | Yes | Algorithm 1 Drop Block |
| Open Source Code | Yes | The code of these results is in https://github.com/tensorflow/tpu/tree/master/models/ official/resnet. 1https://github.com/tensorflow/tpu/tree/master/models/official/resnet 2https://github.com/tensorflow/tpu/tree/master/models/experimental/amoeba_net 3https://github.com/tensorflow/tpu/tree/master/models/official/retinanet |
| Open Datasets | Yes | The ILSVRC 2012 classification dataset [25] contains 1.2 million training images, 50,000 validation images, and 150,000 testing images. Images are labeled with 1,000 categories. COCO dataset [30]. PASCAL VOC 2012 dataset. |
| Dataset Splits | Yes | The ILSVRC 2012 classification dataset [25] contains 1.2 million training images, 50,000 validation images, and 150,000 testing images. Following the common practice, we report classification accuracy on the validation set. |
| Hardware Specification | Yes | We trained models on Tensor Processing Units (TPUs). The models were trained on TPU with 64 images in a batch. |
| Software Dependencies | No | The paper mentions using "TensorFlow implementations" but does not specify a version number for TensorFlow or any other software dependencies. |
| Experiment Setup | Yes | We used the default image size (224 224 for Res Net-50 and 331 331 for Amoeba Net), batch size (1024 for Res Net-50 and 2048 for Amoeba Net) and hyperparameters setting for all the models. We only increased number of training epochs from 90 to 300 for Res Net-50 architecture. The learning rate was decayed by the factor of 0.1 at 100, 200 and 265 epochs. Amoeba Net models were trained for 340 epochs and exponential decay scheme was used for scheduling learning rate. The model was trained using 150 epochs (280k training steps). The initial learning rate 0.08 was applied for first 120 epochs and decayed 0.1 at 120 and 140 epochs. We used α = 0.25 and γ = 1.5 for focal loss. We used a weight decay of 0.0001 and a momentum of 0.9. |