CPN: Complementary Proposal Network for Unconstrained Text Detection

Authors: Longhuang Wu, Shangxuan Tian, Youxin Wang, Pengfei Xiong

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Specifically, our approach achieves improvements of 3.6%, 1.3% and 1.0% on challenging benchmarks ICDAR19-Ar T, IC15, and MSRATD500, respectively.
Researcher Affiliation Collaboration Longhuang Wu, Shangxuan Tian*, Youxin Wang, Pengfei Xiong Shopee Pte. Ltd. {wlonghuang, wyxileroy, xiongpengfei2019}@gmail.com, tianshangxuan@u.nus.edu
Pseudocode No The paper does not contain structured pseudocode or an explicitly labeled algorithm block. Figures illustrate the network architecture but not algorithmic steps in pseudocode format.
Open Source Code No Code for our method will be released.
Open Datasets Yes We adopts five widely studied datasets IC19-Ar T (Chng et al. 2019), CTW1500 (Yuliang et al. 2017), IC17-MLT (Nayef et al. 2017), IC15 (Karatzas et al. 2015), MSRATD500 (Yao, Bai, and Liu 2014), which contain a variety of different scenarios, to evaluate the performance of our proposed complementary network.
Dataset Splits Yes On the IC17-MLT dataset, we train the model for 75 epochs without using extra data such as Synth Text. The initial learning rate is set to 1 10 4 and divided by 10 at 65 and 70 epochs. For the rest of the datasets, we fine-tune the model with their corresponding train sets on the previous IC17-MLT model. During fine-tuning, the model is trained for 24 epochs with an initial learning rate set to 5 10 5 and decayed by 0.1 after 20 epochs. Experiments are typically conducted on the curved CTW1500 test set and the multilingual IC17-MLT validation set.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models used for running experiments. It mentions 'GPU computations' generally but no concrete specifications.
Software Dependencies No The paper mentions software components like 'AdamW', 'Res Net50', and 'Mask RCNN' but does not specify version numbers for these or any other software libraries or frameworks used in the experiments.
Experiment Setup Yes All the networks are optimized with Adam W (Loshchilov and Hutter 2017) with batch size set to 16. On the IC17-MLT dataset, we train the model for 75 epochs without using extra data such as Synth Text. The initial learning rate is set to 1 10 4 and divided by 10 at 65 and 70 epochs. For the rest of the datasets, we fine-tune the model with their corresponding train sets on the previous IC17-MLT model. During fine-tuning, the model is trained for 24 epochs with an initial learning rate set to 5 10 5 and decayed by 0.1 after 20 epochs. Three augmentation schemes are implemented for training: 1) each side of the images is randomly re-scaled within the range of [480, 2560] without maintaining the aspect ratio, 2) each image is randomly flipped horizontally and rotated within the range of [ 10 , 10 ], 3) 640 640 random samples are cropped from each transformed image.