Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

System-Embedded Diffusion Bridge Models

Authors: Bartlomiej Sobieski, Matthew Tivnan, Yuang Wang, Siyeop yoon, Pengfei Jin, Dufan Wu, Quanzheng Li, Przemyslaw Biecek

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 Experiments, Table 2: Quantitative comparison of SDB with the baselines across four inverse problems., Figure 2: Qualitative comparison of SDB (SB) with the best-performing baselines (bridge methods)., E.6 Ablation study on noise schedules
Researcher Affiliation	Academia	1University of Warsaw, 2Harvard University, 3Massachusetts General Hospital, 4Warsaw University of Technology, Corresponding author at EMAIL
Pseudocode	Yes	Algorithm 1 SDB Training, Algorithm 2 SDB Sampling (Euler-Maruyama)
Open Source Code	Yes	We include the source code at https://github.com/sobieskibj/sdb.
Open Datasets	Yes	We evaluate SDB on four inverse problems with varying measurement system complexities, using original images at a resolution of 256 256. Building on prior work [Luo et al., 2023a, Yue et al., 2024], we first consider inpainting on Celeb A-HQ [Karras et al., 2018]... we examine superresolution on DIV2K [Agustsson and Timofte, 2017, Timofte et al., 2017]... CT reconstruction on the RSNA Intracranial Hemorrhage dataset [Anouk Stein et al., 2019] and MRI reconstruction on the Br35H dataset [Merlin, 2022]... motion deblurring task on 128 128 flower images from the Flowers102 dataset [Nilsback and Zisserman, 2008].
Dataset Splits	Yes	Following standard evaluation practice [Luo et al., 2023a, Yue et al., 2024], we report perceptual scores (FID [Heusel et al., 2017], LPIPS [Zhang et al., 2018]) and reconstruction metrics (PSNR, SSIM)., each supervised method learns a mapping between signal samples and their PRs... we train score networks from scratch using the training hyperparameters and architecture of Luo et al. [2023a], with 256 training epochs for supervised methods and 512 for unsupervised ones
Hardware Specification	Yes	All experiments were conducted on a cluster of NVIDIA A100 GPUs, with each method trained using a single GPU.
Software Dependencies	No	We follow the training procedure proposed by Luo et al. [2023a], using the ADAM optimizer [Kingma and Ba, 2015]... or gradient-based operations available in autodifferentiation frameworks such as Py Torch
Experiment Setup	Yes	We follow the training procedure proposed by Luo et al. [2023a], using the ADAM optimizer [Kingma and Ba, 2015] with an initial learning rate of 1 10 4, no weight decay, and (β1, β2) = (0.9, 0.99). A multi-step learning rate scheduler is applied, halving the learning rate at the 36th, 60th, 72nd, and 90th epochs, as in the original work. All methods are trained using the ℓ1 loss function with a batch size of 8.