Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

PID-controlled Langevin Dynamics for Faster Sampling of Generative Models

Authors: Hongyi Chen, Jianhai Shu, Jingtao Ding, Yong Li, Xiao-Ping (Steven) Zhang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments across image generation and reasoning tasks demonstrate that PIDLD achieves higher quality with fewer steps, making Langevin-based generative models more practical for efficiency-critical applications. [...] We further conduct multiple comparative experiments on Langevin sampling-based models (EBMs, SGMs) across various tasks (images, solutions of reasoning tasks), demonstrating that our proposed sampler generates faithful samples faster. In particular, PIDLD achieves at least 10 sampling speedup over baselines under SGM model.
Researcher Affiliation Academia 1 Shenzhen Key Laboratory of Ubiquitous Data Enabling Laboratory, Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China 2 Department of Electronic Engineering, Tsinghua University, Beijing 100084, China 3 Pengcheng Laboratory, Shenzhen 518055, China
Pseudocode Yes Algorithm 1 PIDLD
Open Source Code Yes The implementation can be found at https://github.com/tsinghua-fib-lab/PIDLD.
Open Datasets Yes Following NCSN [33], we test image generation quality on CIFAR10 (32 32) and Celeb A (64 64) datasets (both are unconditional datasets). [...] Specifically, we evaluate the performance of IRED [7] on Harder Datasets for Sudoku [38, 28] and Connectivity [5] problems.
Dataset Splits No The paper mentions using 10000 images for evaluation in image generation tasks and 1000 labeled items for evaluation in reasoning tasks. However, it does not explicitly provide information on the training/validation/test splits used for the underlying models or for its own experimental setup, beyond these evaluation sample counts. It relies on pre-trained models and standard dataset usage without detailing specific splits for its experiments.
Hardware Specification Yes We run both algorithms on NVIDIA A800-SXM4-40GB.
Software Dependencies No The paper does not explicitly mention any software dependencies with specific version numbers (e.g., Python 3.x, PyTorch 1.x, CUDA 11.x). It refers to standard frameworks like NCSN and IGEBM, but without version specifics for the environment.
Experiment Setup Yes For the sampling process, we choose 1280 initial samples uniformly in the square [ 8, 8] [ 8, 8]. We use annealed Langevin dynamics where L = 8, T = 150 and ϵ = 8 10 6. We choose {σi}L i=1 to be a geometric progression, with σ1 = 20 and σ8 = 0.01. [...] We give our hyperparameter settings in Table 4. [...] We give our hyperparameter settings in Table 5. [...] For the step size, we use the default ϵ = 100 for CIFAR10 and ϵ = 400 for Celeb A. We keep ϵ fixed across all experiments.