Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

SD-KDE: Score-Debiased Kernel Density Estimation

Authors: Elliot Epstein, Rajat Vadiraj Dwaraknath, Thanawat Sornwanee, John Winnicki, Jerry Liu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments on synthetic tasks in 1D, 2D and on MNIST, demonstrate that our proposed SD-KDE method significantly reduces the mean integrated squared error compared to the standard Silverman KDE, even with noisy estimates in the score function.
Researcher Affiliation	Academia	1Stanford University, Stanford, CA 94305, USA EMAIL
Pseudocode	Yes	Algorithm 1 Score-Debiased Kernel Density Estimation Require: Data {xi}n i=1, score estimator ˆs, kernel K, KDE bandwidth h, score step size δ 1: Take a single step along the score function: exi xi + δˆs(xi) for i = 1, . . . , n 2: Compute the debiased kernel density estimate: ˆp(x) = 1 nhd Pn i=1 K x e xi h
Open Source Code	Yes	1We open-source our implementation of SD-KDE at https://github.com/Elliotepsteino/SD-KDE.
Open Datasets	Yes	In this study, we follow a similar experimental setup to Liu et al. (2020) and explore the relationship between generated image quality and estimated density using the MNIST dataset a widely recognized benchmark comprising 70,000 grayscale images (28 28 pixels) of handwritten digits (Le Cun & Cortes, 2010).
Dataset Splits	No	The paper does not explicitly provide specific training/test/validation dataset splits. It mentions 'We sample data from three mixtures' for 1D synthetics and 'We trained a DDPM on this dataset' for MNIST, implying a training set, but without concrete percentages, sample counts, or references to predefined splits.
Hardware Specification	Yes	Most of the experiments in the paper were conducted on a Linux cluster with 5 NVIDIA RTX A6000 GPUs, each with 49140 MB memory, running on CUDA Version 12.5. The cluster has 256 AMD EPYC 7763 64-Core Processor CPUs. Some experiments were also conducted on a Mac Book Air (2022) equipped with an Apple M2 chip and 16 GB of unified memory.
Software Dependencies	Yes	Most of the experiments in the paper were conducted on a Linux cluster with 5 NVIDIA RTX A6000 GPUs, each with 49140 MB memory, running on CUDA Version 12.5.
Experiment Setup	Yes	For the diffusion model architecture, we use a 3-layer MLP with hidden dimension 512, and we train the model with Adam for 1500 steps. We use 1000 diffusion steps during training. ... For this experiment, we will sample 1000 points and hold the bandwidth constant at 0.15. ... score-based correction step, with scale 0.015 decaying at a rate of 0.15 at each iteration.