Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

A Data-Driven Prism: Multi-View Source Separation with Diffusion Model Priors

Authors: Sebastian Wagner-Carena, Aizhan Akhmetzhanova, Sydney Erickson

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of our method on a range of synthetic problems as well as real-world galaxy observations.
Researcher Affiliation	Academia	1 Center for Computational Astrophysics, Flatiron Institute 2 Center for Data Science, New York University 3 Department of Physics, Harvard University 4 Department of Physics, Stanford University
Pseudocode	Yes	Algorithm 1: MVSS WITH JOINT DIFFUSION Input: Dataset D = yα iα, {Aαβ iα }Ns β=1, Σα iα Nviews α=1 , number of sources Ns, number of views Nviews, initial denoiser parameters Θ0 = {θβ 0 }Ns β=1, number of EM rounds K Output: Trained diffusion model priors with denoiser parameters ΘK for k 0 to K 1 do foreach yα iα, {Aαβ iα }, Σα iα D do xβ iα Ns β=1 qΘk {xβ}\|yα iα, {Aαβ iα } // E step using equation 14 end Θk+1 = arg maxΘ P β log qθβ(xβ iα) // M step using equation 5 end return ΘK
Open Source Code	Yes	All of the code to reproduce our method and experiments has been made public4. Code: https://github.com/swagnercarena/DDPRISM
Open Datasets	Yes	The MNIST dataset is available under a CC BY-SA 3.0 license [58]. The Image Net dataset is made available for non-commercial purposes [57]. We generate our grass images by taking random 28 28 pixel crops from Image Net images with the grass label. We make the full dataset of 79k pristine galaxy images publicly available7. https://doi.org/10.5281/zenodo.17159988
Dataset Splits	No	The paper describes generating datasets of specific sizes for different experiments (e.g., "dataset of size 216 for each view", "generate 32,768 observations for the grass view and 13,824 for the linear combination of digits and grass"), and a split of observations by resolution for the 'Downsampled' experiment ("one third of our observations are at full-resolution, one third are 2x downsampled, and one third are 4x downsampled"). However, it does not explicitly provide details about standard training/test/validation splits for its experiments, nor does it refer to predefined or standard splits in a way that would allow reproduction of data partitioning for model evaluation.
Hardware Specification	Yes	All timing was done on NVIDIA A100 GPUs with the exception of the galaxy images experiment that was run on H100s. Both 1D manifold experiments used one A100 (40GB) GPU, both Grassy MNIST experiments used four A100 (40GB) GPUs, and the Galaxy Image experiments used four H100 (80GB) GPUs.
Software Dependencies	No	The paper mentions using the 'astroquery package' and 'optuna', but does not provide specific version numbers for these or other key software components required for replication.
Experiment Setup	Yes	Table 2: Hyperparameters for denoiser training and sampling on the one-dimensional manifold experiments. Table 3: Hyperparameters for denoiser training and sampling for the Grassy MNIST experiments and the Galaxy experiment.