Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Learning normalized image densities via dual score matching

Authors: Florentin Guth, Zahra Kadkhodaie, Eero P. Simoncelli

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We train an energy network with this dual score matching objective on the Image Net64 dataset, and obtain a cross-entropy (negative log likelihood) value comparable to the state of the art. We further validate our approach by showing that our energy model strongly generalizes: log probabilities estimated with two networks trained on nonoverlapping data subsets are nearly identical.
Researcher Affiliation	Academia	Florentin Guth Center for Data Science, New York University Flatiron Institute, Simons Foundation EMAIL Zahra Kadkhodaie Flatiron Institute, Simons Foundation EMAIL Eero P. Simoncelli New York University Flatiron Institute, Simons Foundation EMAIL
Pseudocode	No	The paper describes the methodology and architecture in detail using mathematical equations and textual descriptions, but it does not include an explicit pseudocode block or algorithm figure.
Open Source Code	Yes	Pre-trained models and software for running all experiments are available at https://github.com/Florentin Guth/Dual Score Matching.
Open Datasets	Yes	We train our energy model on Image Net64 (Russakovsky et al., 2015; Chrabaszcz et al., 2017)
Dataset Splits	Yes	We train an energy network with this dual score matching objective on the Image Net64 dataset... The previous section demonstrated that our energy-based model achieves near state-of-the-art NLL on Image Net64. That is, the model on average assigns high probability to a set of held-out test images. To this end, we use the strong generalization test developed in Kadkhodaie et al. (2024). We partition the training data into two non-overlapping sets, train a separate energy-based model on each set, and then compare the energies computed by these two models on images from both training subsets. Figure 2 shows the results of this experiment. The two models assign very different probabilities to the same image when the training set size, N, is small. But they converge gradually and compute nearly the same values at N = 10^5.
Hardware Specification	Yes	All models are trained on a single NVIDIA H100 GPU, which takes about 5 days for Image Net64.
Software Dependencies	No	The paper mentions using the Adam optimizer and Fourier features for time embedding but does not specify version numbers for any software libraries or programming languages.
Experiment Setup	Yes	All models are trained for 1M steps, with a batch size of 128. We use the Adam optimizer with default parameters and an initial learning rate of 0.0005 (except for the generalization experiments which used a learning rate of 0.0002) that is halved every 100, 000 steps. In our experiments we use tmin = 10^-9 and tmax = 10^3, and the training image intensities are rescaled to have values in [0, 1]. A time embedding e(t) R^256 is computed with Fourier features cos(ωkt), sin(ωkt) (we use 32 frequencies (ωk)k that are linearly spaced in the log domain and ranging from 1/tmax to 1/tmin) followed by a shallow MLP.