reproducibilityindex.ai

Boosting Semi-Supervised Scene Text Recognition via Viewing and Summarizing

Authors: Yadong Qu, Yuxin Wang, Bangbang Zhou, Zixiao Wang, Hongtao Xie, Yongdong Zhang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiment results show that our method achieves SOTA performance (94.7% and 70.9% average accuracy on common benchmarks and Union14M-Benchmark). Code will be available at https://github.com/qqqyd/Vi Su.
Researcher Affiliation	Academia	Yadong Qu, Yuxin Wang , Bangbang Zhou, Zixiao Wang, Hongtao Xie, Yongdong Zhang University of Science and Technology of China, Hefei, China {qqqyd, bangzhou01, wzx99}@mail.ustc.edu.cn {wangyx58, htxie, zhyd73}@ustc.edu.cn
Pseudocode	No	The paper includes mathematical formulations of loss functions but does not present any pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	Code will be available at https://github.com/qqqyd/Vi Su.
Open Datasets	Yes	SL includes two widely used synthetic datasets MJSynth [14] and Synth Text [12], which contain 9M and 7M synthetic images. For real data without annotations, we adopt Union14M-U [15] with a total of 10M reﬁned images from Book32 [13], CC [32], and Open Images [18].
Dataset Splits	No	The paper lists benchmark datasets used for evaluation (which typically include test sets) but does not explicitly describe specific train/validation/test splits, provide percentages, or mention a dedicated validation set split.
Hardware Specification	Yes	Vi Su is trained on 4 NVIDIA RTX 4090 GPUs.
Software Dependencies	No	The paper mentions software components like 'Adam W optimizer' but does not specify version numbers for any programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	All images are resized to 100 32, and the patch size is 8 4. The maximum length T is set to 25. The character set size is 36, including 10 digits and 26 alphabets. For training settings, the network is trained in an end-to-end manner without pre-training. We adopt Adam W optimizer and one-cycle [35] learning rate scheduler with a maximum learning rate of 6e-4. The batchsize is 384 for both synthetic data and real unlabeled data. We set the EMA smoothing factor α = 0.999, aspect ratio thresh r = 1.3, conﬁdence threshold ηccr = 0.5, ηcua = 0.7, and temperature factor τ = 0.1.