reproducibilityindex.ai

Real-Time Scene Text Detection with Differentiable Binarization

Authors: Minghui Liao, Zhaoyi Wan, Cong Yao, Kai Chen, Xiang Bai11474-11481

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Based on a simple segmentation network, we validate the performance improvements of DB on ﬁve benchmark datasets, which consistently achieves stateof-the-art results, in terms of both detection accuracy and speed. In particular, with a light-weight backbone, the performance improvements by DB are signiﬁcant so that we can look for an ideal tradeoff between detection accuracy and efﬁciency. Speciﬁcally, with a backbone of Res Net-18, our detector achieves an F-measure of 82.8, running at 62 FPS, on the MSRA-TD500 dataset.
Researcher Affiliation	Collaboration	Minghui Liao,1 Zhaoyi Wan,2 Cong Yao,2 Kai Chen,3,4 Xiang Bai1 1Huazhong University of Science and Technology, 2Megvii, 3Shanghai Jiao Tong University, 4Onlyou Tech.
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at: https://github.com/Mh Liao/DB.
Open Datasets	Yes	Datasets Synth Text (Gupta, Vedaldi, and Zisserman 2016) is a synthetic dataset which consists of 800k images. ... MLT-2017 dataset 1 is a multi-language dataset. ... ICDAR 2015 dataset (Karatzas et al. 2015)... MSRA-TD500 dataset (Yao et al. 2012)... CTW1500 dataset CTW1500 (Liu et al. 2019a)... Total-Text dataset Total-Text (Ch ng and Chan 2017)...
Dataset Splits	Yes	MLT-2017 dataset ... There are 7,200 training images, 1,800 validation images and 9,000 testing images in this dataset. We use both the training set and the validation set in the finetune period.
Hardware Specification	Yes	The inference speed is tested with a batch size of 1, with a single 1080ti GPU in a single thread.
Software Dependencies	No	The paper mentions general software components like CNN, PyTorch/TensorFlow (implied by typical ML frameworks), but does not specify any software libraries with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup	Yes	For all the models, we ﬁrst pre-train them with the Synth Text dataset for 100k iterations. Then, we ﬁnetune the models on the corresponding real-world datasets for 1200 epochs. The training batch size is set to 16. We follow a poly learning rate policy where the learning rate at current iteration equals the initial learning rate multiplying (1 iter max iter)power, where the initial learning rate is set to 0.007 and power is 0.9. We use a weight decay of 0.0001 and a momentum of 0.9. The max iter means the maximum iterations, which depends on the maximum epochs. The data augmentation for the training data includes: (1) Random rotation with an angle range of ( 10 , 10 ); (2) Random cropping; (3) Random Flipping. All the processed images are re-sized to 640 640 for better training efﬁciency.