reproducibilityindex.ai

How Control Information Influences Multilingual Text Image Generation and Editing?

Authors: Boqiang Zhang, Zuan Gao, Yadong Qu, Hongtao Xie

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our method achieves state-of-the-art performance in both Chinese and English text generation.
Researcher Affiliation	Academia	Boqiang Zhang, Zuan Gao, Yadong Qu, Hongtao Xie University of Science and Technology of China {cyril,zuangao,qqqyd}@mail.ustc.edu.cn htxie@ustc.edu.cn
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	The code and dataset are available at https://github.com/Cyril Sterling/Text Gen.
Open Datasets	Yes	Therefore, we introduce TG2M, a multilingual dataset sourced from publicly available images including MARIO-10M [5], Wukong [8], Text Seg [31], Ar T [6], LSVT [27], MLT [16], Re CTS [37]. The code and dataset are available at https://github.com/Cyril Sterling/Text Gen.
Dataset Splits	No	The paper does not provide specific dataset split information (e.g., percentages, sample counts, or detailed splitting methodology) for training, validation, and testing.
Hardware Specification	Yes	We train our model on the TG2M dataset using 8 NVIDIA A40 GPUs with a batch size of 176.
Software Dependencies	No	The paper mentions 'SD1.52' and 'diffusers3' (with a GitHub link for diffusers), but does not provide specific version numbers for key software dependencies such as PyTorch, Python, or CUDA for full reproducibility.
Experiment Setup	Yes	We train our model on the TG2M dataset using 8 NVIDIA A40 GPUs with a batch size of 176. Our model converges rapidly and requires only 5 epochs of training. The learning rate is set to 1e-5. During inference, the Fourier balance factors α, β, and s are set to 1.4, 1.2, and 0.2, respectively.