DiffSketcher: Text Guided Vector Sketch Synthesis through Latent Diffusion Models
Authors: XiMing Xing, Chuang Wang, Haitao Zhou, Jing Zhang, Qian Yu, Dong Xu
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that Diff Sketcher achieves greater quality than prior work. The code and demo of Diff Sketcher can be found at https://ximinng.github.io/Diff Sketcher-project/. |
| Researcher Affiliation | Academia | Ximing Xing Beihang University ximingxing@buaa.edu.cn Chuang Wang Beihang University chuangwang@buaa.edu.cn Haitao Zhou Beihang University zhouhaitao@buaa.edu.cn Jing Zhang Beihang University zhang_jing@buaa.edu.cn Qian Yu Beihang University qianyu@buaa.edu.cn Dong Xu The University of Hong Kong dongxu@cs.hku.hk |
| Pseudocode | No | The paper describes algorithms and processes but does not include a formally labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | The code and demo of Diff Sketcher can be found at https://ximinng.github.io/Diff Sketcher-project/. |
| Open Datasets | Yes | Recent breakthroughs in text-to-image generation have been driven by diffusion models [23, 28, 30, 31] trained on billions of image-text pairs [34]. [34] Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, et al. Laion5b: An open large-scale dataset for training next generation image-text models. ar Xiv preprint ar Xiv:2210.08402, 2022. |
| Dataset Splits | No | The paper leverages a pre-trained latent diffusion model and an optimization-based approach for sketch synthesis. It does not mention traditional dataset splits (e.g., train/validation/test percentages or counts) for its own training or evaluation data. |
| Hardware Specification | No | The paper does not specify any particular hardware (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using a 'DDIM solver', 'torch.transforms', and 'Adam optimizers' but does not provide specific version numbers for these software components or the underlying programming language/frameworks. |
| Experiment Setup | Yes | Specifically, given a text prompt, we use a DDIM solver [38] to sample a raster image from the latent diffusion model in 100 steps with classifier-free guidance [12], using a scale of ω = 7.5. For classifier-free guidance, we set ω = 100... we set the learning rate of the control point optimizer to 1.0 and the color optimizer to 0.1. ...we use layers 3 and 4 of the Res Net101 CLIP model. ...we sample a noise level t from the uniform distribution U(0.05, 0.95). |