reproducibilityindex.ai

AirSketch: Generative Motion to Sketch

Authors: Hui Xian Grace Lim, Xuanming Cui, Yogesh Rawat, Ser Nam Lim

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our findings demonstrate that beyond producing photo-realistic images from precise spatial inputs, controllable image diffusion can effectively produce a refined, clear sketch from a noisy input. We present two air drawing datasets to study this problem.
Researcher Affiliation	Academia	Hui Xian Grace Lim hxgrace@ucf.edu Xuanming Cui xu979022@ucf.edu Yogesh S Rawat yogesh@crcv.ucf.edu Ser-Nam Lim sernam@ucf.edu University of Central Florida
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code and dataset are available under: https://github.com/hxgr4ce/DoodleFusion.
Open Datasets	Yes	We use samples from the Quick, Draw! dataset [22] as the intended ground-truth sketch; each stroke is represented as a sequence of timestamps and coordinates. Code and dataset are available under: https://github.com/hxgr4ce/DoodleFusion.
Dataset Splits	No	The paper describes splitting the Quick, Draw! dataset into training and held-out categories for evaluation of generalizability, but does not explicitly mention a separate validation split for model tuning. 'In training, we use a subset of 100 categories from the Quick, Draw! dataset [22]. To test for generalizability, we select 10 categories with similar statistics from the rest and exclude them from training.'
Hardware Specification	Yes	All training is conducted on two Nvidia H100 GPUs.
Software Dependencies	No	The paper mentions key software components like Stable Diffusion XL, Control Net, and Low-Rank Adaptation (LoRA), but does not provide specific version numbers for these or other underlying software dependencies (e.g., Python, PyTorch, CUDA).
Experiment Setup	Yes	For finetuning the diffusion model with LORA, we set LORA rank to 4, per device batch size to 16, learning rate to 5e 5, gradient accumulation steps to 4, and train for 6K steps on the Quick, Draw! dataset. For our augmentation-based Control Net training, we set per device batch size to 8, learning rate to 2e 5, gradient accumulation steps to 4, and the proportion of empty prompts to 25%.