reproducibilityindex.ai

Context-Based Contrastive Learning for Scene Text Recognition

Authors: Xinyun Zhang, Binwu Zhu, Xufeng Yao, Qi Sun, Ruiyu Li, Bei Yu3353-3361

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that Con CLR signiﬁcantly improves out-of-vocabulary generalization and achieves stateof-the-art performance on public benchmarks together with attention-based recognizers. ... In this section, we conduct extensive experiments to demonstrate the effectiveness of our proposed method.
Researcher Affiliation	Collaboration	1The Chinese University of Hong Kong 2Smart More {xyzhang21,bwzhu,xfyao,qsun,byu}@cse.cuhk.edu.hk, royliruiyu@gmail.com
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described.
Open Datasets	Yes	The training set consists of two synthetic datasets, MJ (Jaderberg et al. 2016, 2014) and ST (Gupta, Vedaldi, and Zisserman 2016)
Dataset Splits	Yes	The training set consists of two synthetic datasets, MJ (Jaderberg et al. 2016, 2014) and ST (Gupta, Vedaldi, and Zisserman 2016), and evaluation is conducted on six public benchmarks, including ICDAR 2013 (IC13) (Karatzas et al. 2013), ICDAR 2015 (IC15) (Karatzas et al. 2015), IIIT 5K-Words (IIIT) (Mishra, Alahari, and Jawahar 2012), Street View Text (SVT) (Wang, Babenko, and Belongie 2011), Street View Text Perspective (SVTP) (Phan et al. 2013), and CUTE80 (CUTE) (Risnumawan etm al. 2014), and our synthesized benchmark Out Text.
Hardware Specification	Yes	All the experiments are conducted on four NVIDIA 2080Ti GPUs with batch size 384.
Software Dependencies	No	The paper mentions software components like ResNet, Transformer, and ADAM optimizer but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	We use three transformer layers for the parallel attention module, with eight heads for each of them. Images are resized to 32 128 with common data augmentation, such as random rotation, afﬁne transformation, color jittering, and etc. We use ADAM as the optimizer, with a learning rate initialized to 1e 4 and decayed to 1e 5 at the 6-th epoch. All the experiments are conducted on four NVIDIA 2080Ti GPUs with batch size 384.