Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

How Control Information Influences Multilingual Text Image Generation and Editing?

Authors: Boqiang Zhang, Zuan Gao, Yadong Qu, Hongtao Xie

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our method achieves state-of-the-art performance in both Chinese and English text generation.
Researcher Affiliation Academia Boqiang Zhang, Zuan Gao, Yadong Qu, Hongtao Xie University of Science and Technology of China EMAIL EMAIL
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes The code and dataset are available at https://github.com/Cyril Sterling/Text Gen.
Open Datasets Yes Therefore, we introduce TG2M, a multilingual dataset sourced from publicly available images including MARIO-10M [5], Wukong [8], Text Seg [31], Ar T [6], LSVT [27], MLT [16], Re CTS [37]. The code and dataset are available at https://github.com/Cyril Sterling/Text Gen.
Dataset Splits No The paper does not provide specific dataset split information (e.g., percentages, sample counts, or detailed splitting methodology) for training, validation, and testing.
Hardware Specification Yes We train our model on the TG2M dataset using 8 NVIDIA A40 GPUs with a batch size of 176.
Software Dependencies No The paper mentions 'SD1.52' and 'diffusers3' (with a GitHub link for diffusers), but does not provide specific version numbers for key software dependencies such as PyTorch, Python, or CUDA for full reproducibility.
Experiment Setup Yes We train our model on the TG2M dataset using 8 NVIDIA A40 GPUs with a batch size of 176. Our model converges rapidly and requires only 5 epochs of training. The learning rate is set to 1e-5. During inference, the Fourier balance factors α, β, and s are set to 1.4, 1.2, and 0.2, respectively.