Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learnable Sampler Distillation for Discrete Diffusion Models
Authors: Feiyang Fu, Tongxian Guo, Zhaoqiang Liu
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we empirically evaluate the performance of our proposed LSD approach and its improved version LSD+. Our goal is to validate their ability to generate high-quality samples at low NFEs. We conduct evaluations across diverse settings, including text generation, image generation, and a synthetic sequence task, comparing against various baselines. We highlight that our LSD+ provides an efficient learning process for the coefficients and time schedules, typically requiring 5 minutes on an NVIDIA RTX4090 GPU, compared to around 10 minutes of training time for JYS under the same environment. And the learned student sampler introduces no additional computational burden during sampling. |
| Researcher Affiliation | Academia | Feiyang Fu, Tongxian Guo, Zhaoqiang Liu University of Electronic Science and Technology of China |
| Pseudocode | Yes | We present details of the sampling and training processes for LSD in Algorithms 1 and 2 respectively. (...) We present details of the training and sampling processes for LSD+ in the supplementary material. |
| Open Source Code | Yes | Our code is available at https://github.com/feiyangfu/LSD. |
| Open Datasets | Yes | For the text generation task, we employed three pre-trained DDM backbones for validation, namely SEDD-small [6], SEDD-medium [6], and RADD [9]. These are absorbing DDMs of GPT-2 level for text generation, trained on the Open Web Text dataset [44]. (...) We also validate our LSD+ approach on the image generation task for the CIFAR-10 dataset [46]. |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits. It mentions generating "1024 samples, each containing 1024 tokens" for text generation and evaluating "50k samples" for image generation, but these refer to evaluation data rather than the methodology for splitting the original datasets. |
| Hardware Specification | Yes | We highlight that our LSD+ provides an efficient learning process for the coefficients and time schedules, typically requiring 5 minutes on an NVIDIA RTX4090 GPU, compared to around 10 minutes of training time for JYS under the same environment. (...) All experiments are conducted on an NVIDIA RTX4090 GPU. |
| Software Dependencies | No | The paper mentions specific models and frameworks like "GPT2-large model", "CTMC", "Mask GIT", "Halton sampler", "Ancestral", "Re Masking (Re MDM)", "MDLM", "Llama-3-8B", "Fast DLLMs", "Diffu LLa MA", "Diffu GPT", "DNDM", and "Fair Seq" but does not provide specific version numbers for these software components or any other libraries. |
| Experiment Setup | Yes | Specifically, we set the terminate time ̈ as 0.0001, the total sampling steps N of the teacher sampler as 1024, and the distance metric d as the KL divergence. We set the number of training samples as 64, the training epoch as 20, and the learning rate ̈ as 0.001. |