Text Image Inpainting via Global Structure-Guided Diffusion Models

Authors: Shipeng Zhu, Pengfei Fang, Chenjie Zhu, Zuoyan Zhao, Qiang Xu, Hui Xue

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The efficacy of our approach is demonstrated by thorough empirical study, including a substantial boost in both recognition accuracy and image quality.
Researcher Affiliation Academia 1School of Computer Science and Engineering, Southeast University, Nanjing 210096, China 2Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China {shipengzhu, fangpengfei, chenjiezhu, zuoyanzhao, 220232307, hxue}@seu.edu.cn
Pseudocode No The paper describes the model architecture and training procedures textually and with diagrams, but does not provide formal pseudocode blocks or algorithms.
Open Source Code Yes Code and datasets are available at: https://github.com/blackprotoss/GSDM.
Open Datasets Yes Code and datasets are available at: https://github.com/blackprotoss/GSDM. For handwritten text, the TII-HT dataset comprises 40,078 images from the IAM dataset (Marti and Bunke 2002).
Dataset Splits Yes For fairness in evaluation, we divide our proposed datasets into distinct training and testing sets, respectively. In the TII-ST dataset... our training set consists of 80,000 synthesized images and 4,877 real images. Meanwhile, the testing set includes 1,599 real images. For the TII-HT dataset, the training set comprises of 38,578 images sourced from IAM, while the testing set contains 1,600 images.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies or libraries used (e.g., Python, PyTorch versions).
Experiment Setup No While image resizing (64x256) is mentioned as a processing step, the paper does not provide specific hyperparameters (e.g., learning rate, batch size, number of epochs, optimizer settings) or detailed training configurations.