Learning to Draw Text in Natural Images with Conditional Adversarial Networks

Authors: Shancheng Fang, Hongtao Xie, Jianjun Chen, Jianlong Tan, Yongdong Zhang

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on SVHN dataset and ICDAR, IIIT5K datasets demonstrate our method is able to synthesize visually appealing text images. Besides, we also show the high-quality images synthesized by our method can be used to boost the performance of a scene text recognition algorithm. The experiments conducted on SVHN dataset and ICDAR, IIIT5K datasets show that STS-GAN has the ability to synthesize high-quality text images within a complex environment.
Researcher Affiliation Academia Shancheng Fang1,2 , Hongtao Xie3 , Jianjun Chen1 , Jianlong Tan1 , Yongdong Zhang3 1Institute of Information Engineering, Chinese Academy of Sciences 2School of Cyber Security, University of Chinese Academy of Sciences 3School of Information Science and Technology, University of Science and Technology of China
Pseudocode No No pseudocode or algorithm blocks are provided in the paper.
Open Source Code No The paper does not include an unambiguous statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets Yes The first is Street View House Number (SVHN) [Netzer et al., 2011], which contains 10 classes from digit 1 to 10. There are 73257 training and 26032 test character images, and 33402 training and 13068 test word images in SVHN dataset. The second is an alphanumeric (62 characters) dataset composed of data from ICDAR 2003 [Lucas et al., 2003] and IIIT 5K-word [Mishra et al., 2012] datasets.
Dataset Splits No The paper specifies training and test splits with counts for SVHN and IC03+IIIT datasets but does not explicitly mention a separate validation split or its proportions/counts.
Hardware Specification No No specific hardware details (such as GPU or CPU models, or cloud instance types) are provided for the experimental setup.
Software Dependencies No The paper mentions using 'Adam optimizer' but does not specify version numbers for any software dependencies like deep learning frameworks, libraries, or specific solvers.
Experiment Setup Yes We employ Adam optimizer as solver with momentum β1 = 0 and β2 = 0.999 for all the networks. The learning rates are 2 10 4 for Dc and 5 10 5 for Gc, and 2 10 4 for both Dw and Gw. For character model, batch size is set to 512. For word model, we sample 8 images with the same word length each batch. All the latent vector z is sampled from standard Gaussian distribution.