CPN: Complementary Proposal Network for Unconstrained Text Detection
Authors: Longhuang Wu, Shangxuan Tian, Youxin Wang, Pengfei Xiong
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Specifically, our approach achieves improvements of 3.6%, 1.3% and 1.0% on challenging benchmarks ICDAR19-Ar T, IC15, and MSRATD500, respectively. |
| Researcher Affiliation | Collaboration | Longhuang Wu, Shangxuan Tian*, Youxin Wang, Pengfei Xiong Shopee Pte. Ltd. {wlonghuang, wyxileroy, xiongpengfei2019}@gmail.com, tianshangxuan@u.nus.edu |
| Pseudocode | No | The paper does not contain structured pseudocode or an explicitly labeled algorithm block. Figures illustrate the network architecture but not algorithmic steps in pseudocode format. |
| Open Source Code | No | Code for our method will be released. |
| Open Datasets | Yes | We adopts five widely studied datasets IC19-Ar T (Chng et al. 2019), CTW1500 (Yuliang et al. 2017), IC17-MLT (Nayef et al. 2017), IC15 (Karatzas et al. 2015), MSRATD500 (Yao, Bai, and Liu 2014), which contain a variety of different scenarios, to evaluate the performance of our proposed complementary network. |
| Dataset Splits | Yes | On the IC17-MLT dataset, we train the model for 75 epochs without using extra data such as Synth Text. The initial learning rate is set to 1 10 4 and divided by 10 at 65 and 70 epochs. For the rest of the datasets, we fine-tune the model with their corresponding train sets on the previous IC17-MLT model. During fine-tuning, the model is trained for 24 epochs with an initial learning rate set to 5 10 5 and decayed by 0.1 after 20 epochs. Experiments are typically conducted on the curved CTW1500 test set and the multilingual IC17-MLT validation set. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models used for running experiments. It mentions 'GPU computations' generally but no concrete specifications. |
| Software Dependencies | No | The paper mentions software components like 'AdamW', 'Res Net50', and 'Mask RCNN' but does not specify version numbers for these or any other software libraries or frameworks used in the experiments. |
| Experiment Setup | Yes | All the networks are optimized with Adam W (Loshchilov and Hutter 2017) with batch size set to 16. On the IC17-MLT dataset, we train the model for 75 epochs without using extra data such as Synth Text. The initial learning rate is set to 1 10 4 and divided by 10 at 65 and 70 epochs. For the rest of the datasets, we fine-tune the model with their corresponding train sets on the previous IC17-MLT model. During fine-tuning, the model is trained for 24 epochs with an initial learning rate set to 5 10 5 and decayed by 0.1 after 20 epochs. Three augmentation schemes are implemented for training: 1) each side of the images is randomly re-scaled within the range of [480, 2560] without maintaining the aspect ratio, 2) each image is randomly flipped horizontally and rotated within the range of [ 10 , 10 ], 3) 640 640 random samples are cropped from each transformed image. |