Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-based Decoding

Authors: Xiner Li, Yulai Zhao, Chenyu Wang, Gabriele Scalia, Gokcen Eraslan, Surag Nair, Tommaso Biancalani, Shuiwang Ji, Aviv Regev, Sergey Levine, Masatoshi Uehara

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments to assess the performance of our algorithm relative to baselines and its sensitivity to various hyperparameters. We start by outlining the experimental setup, and then present the results. The code is in Section G. ... Table 1: Top 10 and 50 quantiles of the generated samples (512) in terms of rewards (with 95% confidence intervals) and metrics quantifying naturalness (CLIP score and LL).
Researcher Affiliation	Collaboration	Xiner Li 1,4 Yulai Zhao 2 Chenyu Wang 3 Gabriele Scalia 4 Gokcen Eraslan 4 Surag Nair 4 Tommaso Biancalani 4 Shuiwang Ji 1 Aviv Regev 4 Sergey Levine 5 Masatoshi Uehara 6 1Texas A&M University 2Princeton University 3MIT 4Genentech 5UC Berkeley 6Evolutionary Scale
Pseudocode	Yes	Algorithm 1 SVDD (Soft Value-Based Decoding in Diffusion Models) ... Algorithm 2 SVDD with Replacement of Non-Promising Samples ... Algorithm 3 Value Function Estimation Using Monte Carlo Regression ... Algorithm 4 Value Function Estimation using Posterior Mean Approximation ... Algorithm 5 Guidance with Standard SMC (for reward maximization) ... Algorithm 6 SVDD (Soft Value-Based Decoding in Diffusion Models)
Open Source Code	Yes	The code is available at https://github.com/masa-ue/SVDD. ... We also provide an anonymous code link containing the implementation of our method and baselines: https://anonymous.4open.science/r/SVDD/.
Open Datasets	Yes	Molecules: We use GDSS (Jo et al., 2022), trained on ZINC-250k (Irwin & Shoichet, 2005), as the pre-trained diffusion model (T = 1000). ... DNAs (enhancers) and RNAs (5 Untranslated regions (UTRs)): We use the discrete diffusion model (Sahoo et al., 2024), trained on datasets from Gosai et al. (2023) for enhancers, and from Sample et al. (2019) for 5 UTRs, as our pre-trained diffusion model (T = 128). ... We examine two publicly available large datasets: enhancers (n ~700k) (Gosai et al., 2023) and UTRs (n ~300k) (Sample et al., 2019), with activity levels measured by massively parallel reporter assays (MPRA) (Inoue et al., 2019).
Dataset Splits	No	The paper describes the datasets used for training pre-trained diffusion models (e.g., ZINC-250k), but does not specify the train/test/validation splits for these datasets. For its own evaluation, the paper generates samples and assesses their properties: "We report the top 10% quantile and median of rewards from the generated designs". It also states: "We compare the metrics using 512 molecules generated from the pre-trained GDSS model and from different methods optimizing QED". No specific dataset splits for training or evaluating the SVDD method itself are provided.
Hardware Specification	Yes	The deployment environments are Ubuntu 20.04 with 48 Intel(R) Xeon(R) Silver, 4214R CPU @ 2.40GHz, 755GB RAM, and graphics cards NVIDIA RTX 2080Ti. Each of our experiments is conducted on a single NVIDIA RTX 2080Ti or RTX A6000 GPU.
Software Dependencies	No	Our implementation is under the architecture of Py Torch (Paszke et al., 2019). ... We calculate QED and SA scores using the RDKit (Landrum et al., 2016) library. We use the docking program Quick Vina 2 (Alhossary et al., 2015). While these software components are mentioned with citations, explicit version numbers (e.g., PyTorch 1.9, RDKit 2020.03) are not provided.
Experiment Setup	Yes	For example, we use α = 0.01 for biological sequences and images and α = 0.05 for molecules. ... We generally set M = 20 for images and M = 10 for other domains. ... For images, we use standard CNNs for this purpose, with the same architecture as the reward model. For molecular tasks, we use a Graph Isomorphism Network (GIN) model ... dimension of the hidden layer is 300. The number of convolutional layers in the GIN model is selected from the set {3, 5}; and we select the maximum number of iterations from {300, 500, 1000}, the initial learning rate from {1e-3, 3e-3, 5e-3, 1e-4}, and the batch size from {32, 64, 128}. ... The percentage of samples replaced is a hyperparameter selected from {0.03, 0.04, 0.05} in (2).