DiffSketcher: Text Guided Vector Sketch Synthesis through Latent Diffusion Models

Authors: XiMing Xing, Chuang Wang, Haitao Zhou, Jing Zhang, Qian Yu, Dong Xu

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that Diff Sketcher achieves greater quality than prior work. The code and demo of Diff Sketcher can be found at https://ximinng.github.io/Diff Sketcher-project/.
Researcher Affiliation Academia Ximing Xing Beihang University ximingxing@buaa.edu.cn Chuang Wang Beihang University chuangwang@buaa.edu.cn Haitao Zhou Beihang University zhouhaitao@buaa.edu.cn Jing Zhang Beihang University zhang_jing@buaa.edu.cn Qian Yu Beihang University qianyu@buaa.edu.cn Dong Xu The University of Hong Kong dongxu@cs.hku.hk
Pseudocode No The paper describes algorithms and processes but does not include a formally labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code Yes The code and demo of Diff Sketcher can be found at https://ximinng.github.io/Diff Sketcher-project/.
Open Datasets Yes Recent breakthroughs in text-to-image generation have been driven by diffusion models [23, 28, 30, 31] trained on billions of image-text pairs [34]. [34] Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, et al. Laion5b: An open large-scale dataset for training next generation image-text models. ar Xiv preprint ar Xiv:2210.08402, 2022.
Dataset Splits No The paper leverages a pre-trained latent diffusion model and an optimization-based approach for sketch synthesis. It does not mention traditional dataset splits (e.g., train/validation/test percentages or counts) for its own training or evaluation data.
Hardware Specification No The paper does not specify any particular hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies No The paper mentions using a 'DDIM solver', 'torch.transforms', and 'Adam optimizers' but does not provide specific version numbers for these software components or the underlying programming language/frameworks.
Experiment Setup Yes Specifically, given a text prompt, we use a DDIM solver [38] to sample a raster image from the latent diffusion model in 100 steps with classifier-free guidance [12], using a scale of ω = 7.5. For classifier-free guidance, we set ω = 100... we set the learning rate of the control point optimizer to 1.0 and the color optimizer to 0.1. ...we use layers 3 and 4 of the Res Net101 CLIP model. ...we sample a noise level t from the uniform distribution U(0.05, 0.95).