Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Universal Few-shot Spatial Control for Diffusion Models
Authors: Kiet Nguyen, Chanhyuk Lee, Donggyun Kim, Dong Hoon Lee, Seunghoon Hong
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments across diverse spatial control modalities, including edge maps, depth maps, normal maps, human pose, and segmentation masks for human body, demonstrate that UFC outperforms prior approaches in few-shot settings by large margins. |
| Researcher Affiliation | Academia | Kiet T. Nguyen Chanhyuk Lee Donggyun Kim Dong Hoon Lee Seunghoon Hong KAIST EMAIL |
| Pseudocode | No | The paper only describes steps and mathematical formulations in paragraph text without a structured pseudocode or algorithm block. |
| Open Source Code | Yes | Code is available at https://github.com/kietngt00/UFC. |
| Open Datasets | Yes | To enable episodic meta-training, we sample 300K text-image pairs from LAION400M [45], with 150K containing humans (for Pose and Densepose) and 150K randomly sampled. ... We quantitatively evaluate all methods on COCO 2017 [30] validation split, which contains 5,000 images with associated captions. |
| Dataset Splits | Yes | To enable episodic meta-training, we sample 300K text-image pairs from LAION400M [45], with 150K containing humans (for Pose and Densepose) and 150K randomly sampled. ... We quantitatively evaluate all methods on COCO 2017 [30] validation split, which contains 5,000 images with associated captions. All images and their corresponding conditions are resized to 512 × 512. |
| Hardware Specification | Yes | GPU(s) Batch size/GPU Mem/GPU Time Training 8 NVIDIA RTX 3090 6 16GB 12 hours Fine-tuning 1 NVIDIA RTX 3090 10 21GB <= 1 hour Inference 1 NVIDIA RTX 3090 8 11.5GB 3.06 s / image. ... We train our model on 8 NVIDIA RTX A6000 GPUs. |
| Software Dependencies | No | The paper mentions specific models like "Stable Diffusion v1.5" and algorithms like "Adam W" and "PNDM sampler" with citations, but does not provide specific version numbers for multiple key software components or libraries (e.g., Python, PyTorch, CUDA versions) as required by the guidelines. |
| Experiment Setup | Yes | Implementation details We train UFC (UNet diffusion backbone) for 12.5K iterations with a batch size of 96 on 8 NVIDIA RTX 3090 GPUs, using Adam W [33] with a learning rate of 1 × 10−5. During inference with the UNet backbone, we adopt the PNDM sampler [32] with 50 denoising steps, classifier-free guidance (CFG) [17] scale of 7.5, and seed 42. ... Hyperparameters: We train using the Adam W optimizer [33] with a learning rate of 1 × 10−5 and a weight decay of 0.01. For each training batch, we randomly select two tasks per batch, each accompanied by a support set of three example pairs sampled for its query condition. |