AirSketch: Generative Motion to Sketch
Authors: Hui Xian Grace Lim, Xuanming Cui, Yogesh Rawat, Ser Nam Lim
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our findings demonstrate that beyond producing photo-realistic images from precise spatial inputs, controllable image diffusion can effectively produce a refined, clear sketch from a noisy input. We present two air drawing datasets to study this problem. |
| Researcher Affiliation | Academia | Hui Xian Grace Lim hxgrace@ucf.edu Xuanming Cui xu979022@ucf.edu Yogesh S Rawat yogesh@crcv.ucf.edu Ser-Nam Lim sernam@ucf.edu University of Central Florida |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code and dataset are available under: https://github.com/hxgr4ce/DoodleFusion. |
| Open Datasets | Yes | We use samples from the Quick, Draw! dataset [22] as the intended ground-truth sketch; each stroke is represented as a sequence of timestamps and coordinates. Code and dataset are available under: https://github.com/hxgr4ce/DoodleFusion. |
| Dataset Splits | No | The paper describes splitting the Quick, Draw! dataset into training and held-out categories for evaluation of generalizability, but does not explicitly mention a separate validation split for model tuning. 'In training, we use a subset of 100 categories from the Quick, Draw! dataset [22]. To test for generalizability, we select 10 categories with similar statistics from the rest and exclude them from training.' |
| Hardware Specification | Yes | All training is conducted on two Nvidia H100 GPUs. |
| Software Dependencies | No | The paper mentions key software components like Stable Diffusion XL, Control Net, and Low-Rank Adaptation (LoRA), but does not provide specific version numbers for these or other underlying software dependencies (e.g., Python, PyTorch, CUDA). |
| Experiment Setup | Yes | For finetuning the diffusion model with LORA, we set LORA rank to 4, per device batch size to 16, learning rate to 5e 5, gradient accumulation steps to 4, and train for 6K steps on the Quick, Draw! dataset. For our augmentation-based Control Net training, we set per device batch size to 8, learning rate to 2e 5, gradient accumulation steps to 4, and the proportion of empty prompts to 25%. |