Real-Time Scene Text Detection with Differentiable Binarization
Authors: Minghui Liao, Zhaoyi Wan, Cong Yao, Kai Chen, Xiang Bai11474-11481
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Based on a simple segmentation network, we validate the performance improvements of DB on five benchmark datasets, which consistently achieves stateof-the-art results, in terms of both detection accuracy and speed. In particular, with a light-weight backbone, the performance improvements by DB are significant so that we can look for an ideal tradeoff between detection accuracy and efficiency. Specifically, with a backbone of Res Net-18, our detector achieves an F-measure of 82.8, running at 62 FPS, on the MSRA-TD500 dataset. |
| Researcher Affiliation | Collaboration | Minghui Liao,1 Zhaoyi Wan,2 Cong Yao,2 Kai Chen,3,4 Xiang Bai1 1Huazhong University of Science and Technology, 2Megvii, 3Shanghai Jiao Tong University, 4Onlyou Tech. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at: https://github.com/Mh Liao/DB. |
| Open Datasets | Yes | Datasets Synth Text (Gupta, Vedaldi, and Zisserman 2016) is a synthetic dataset which consists of 800k images. ... MLT-2017 dataset 1 is a multi-language dataset. ... ICDAR 2015 dataset (Karatzas et al. 2015)... MSRA-TD500 dataset (Yao et al. 2012)... CTW1500 dataset CTW1500 (Liu et al. 2019a)... Total-Text dataset Total-Text (Ch ng and Chan 2017)... |
| Dataset Splits | Yes | MLT-2017 dataset ... There are 7,200 training images, 1,800 validation images and 9,000 testing images in this dataset. We use both the training set and the validation set in the finetune period. |
| Hardware Specification | Yes | The inference speed is tested with a batch size of 1, with a single 1080ti GPU in a single thread. |
| Software Dependencies | No | The paper mentions general software components like CNN, PyTorch/TensorFlow (implied by typical ML frameworks), but does not specify any software libraries with version numbers (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | Yes | For all the models, we first pre-train them with the Synth Text dataset for 100k iterations. Then, we finetune the models on the corresponding real-world datasets for 1200 epochs. The training batch size is set to 16. We follow a poly learning rate policy where the learning rate at current iteration equals the initial learning rate multiplying (1 iter max iter)power, where the initial learning rate is set to 0.007 and power is 0.9. We use a weight decay of 0.0001 and a momentum of 0.9. The max iter means the maximum iterations, which depends on the maximum epochs. The data augmentation for the training data includes: (1) Random rotation with an angle range of ( 10 , 10 ); (2) Random cropping; (3) Random Flipping. All the processed images are re-sized to 640 640 for better training efficiency. |