Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Diffusion on Demand: Selective Caching and Modulation for Efficient Generation

Authors: Hee Min Choi, Hyoa Kang, Dokwan Oh, Nam Ik Cho

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that our method achieves similar generation performance to the original sampler while requiring significantly less computation. For example, FLOPs and inference latency are reduced by 2.93 and 2.15 for Di T-XL/2 and by 2.83 and 1.50 for Pix Art-α, respectively. We find that modulation is effective when applied to as little as 2% of layers, resulting in negligible computation overhead.
Researcher Affiliation Collaboration Hee Min Choi 1 EMAIL Hyoa Kang 1 2 EMAIL Dokwan Oh 1 EMAIL Nam Ik Cho 2 EMAIL 1 Samsung Electronics Co., Ltd. 2 Seoul National University
Pseudocode Yes Algorithm 1 Training, Algorithm 2 Sampling (and similar algorithms in the appendix: Algorithm 3 Training, Algorithm 4 Sampling)
Open Source Code No Answer: [No] Justification: No, the code is not open-sourced, but the paper includes pseudocode, model links, and implementation details in the supplementary material to support accessibility.
Open Datasets Yes We use the training set of Image Net [7] for Di T-XL/2 and COCO2014 [21] for text-to-image generation models.
Dataset Splits Yes For class-conditional image generation, we generate 50K images with a resolution of 256 256 by randomly sampling 1K classes on Image Net. For text-to-image generation, we randomly select 5K/30K captions from the COCO2014 validation set and generate one image per caption, testing at different resolutions: 256 256, 512 512 and 1024 1024. We uploaded the list files containing 5K/30K validation prompts to an anonymous Github page, and the URLs are provided in Table 9.
Hardware Specification Yes We train the modulation gate scores and modulator parameters on 4 H100 GPUs with Adam W optimizer [24] and learning rate 10 3 for 200K iterations. The hardware setup consists of a single NVIDIA H100 GPU (80GB HBM3) and an Intel Xeon Gold 6442Y CPU.
Software Dependencies Yes The evaluation was performed using CUDA 12.1, Python 3.10, and Py Torch 2.2.2.
Experiment Setup Yes The optimization is performed only on the modulation gate score and modulator... We train the modulation gate scores and modulator parameters on 4 H100 GPUs with Adam W optimizer [24] and learning rate 10 3 for 200K iterations. The global batch size for 256 256 experiments is set for 64, and the corresponding value for 512 512 and 1024 1024 experiments is 8. Unless specified, the threshold c for the modulation gate is set to 0.9.