How Control Information Influences Multilingual Text Image Generation and Editing?
Authors: Boqiang Zhang, Zuan Gao, Yadong Qu, Hongtao Xie
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our method achieves state-of-the-art performance in both Chinese and English text generation. |
| Researcher Affiliation | Academia | Boqiang Zhang, Zuan Gao, Yadong Qu, Hongtao Xie University of Science and Technology of China {cyril,zuangao,qqqyd}@mail.ustc.edu.cn htxie@ustc.edu.cn |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code and dataset are available at https://github.com/Cyril Sterling/Text Gen. |
| Open Datasets | Yes | Therefore, we introduce TG2M, a multilingual dataset sourced from publicly available images including MARIO-10M [5], Wukong [8], Text Seg [31], Ar T [6], LSVT [27], MLT [16], Re CTS [37]. The code and dataset are available at https://github.com/Cyril Sterling/Text Gen. |
| Dataset Splits | No | The paper does not provide specific dataset split information (e.g., percentages, sample counts, or detailed splitting methodology) for training, validation, and testing. |
| Hardware Specification | Yes | We train our model on the TG2M dataset using 8 NVIDIA A40 GPUs with a batch size of 176. |
| Software Dependencies | No | The paper mentions 'SD1.52' and 'diffusers3' (with a GitHub link for diffusers), but does not provide specific version numbers for key software dependencies such as PyTorch, Python, or CUDA for full reproducibility. |
| Experiment Setup | Yes | We train our model on the TG2M dataset using 8 NVIDIA A40 GPUs with a batch size of 176. Our model converges rapidly and requires only 5 epochs of training. The learning rate is set to 1e-5. During inference, the Fourier balance factors α, β, and s are set to 1.4, 1.2, and 0.2, respectively. |