Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Score-based Generative Modeling Secretly Minimizes the Wasserstein Distance
Authors: Dohyun Kwon, Ying Fan, Kangwook Lee
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our numerical experiments support our findings. By analyzing our upper bounds, we provide a few techniques to obtain tighter upper bounds. |
| Researcher Affiliation | Academia | Dohyun Kwon, Ying Fan, Kangwook Lee University of Wisconsin-Madison |
| Pseudocode | No | No pseudocode or algorithm block was found in the paper. |
| Open Source Code | Yes | Code is available at https://github.com/UW-Madison-Lee-Lab/score-wasserstein. |
| Open Datasets | Yes | Here we adopt three 2D datasets for simulation: One cluster Gaussian N(0, 0.1I), two moons in [28], and four clusters Gaussian mixture N(( 0.5, 0.5) , 0.01I) with equal weights for each cluster. |
| Dataset Splits | No | The paper does not explicitly provide training/validation/test dataset splits with percentages, sample counts, or references to predefined splits for the datasets used. |
| Hardware Specification | No | No specific hardware details (e.g., GPU models, CPU types, or memory) used for running the experiments are mentioned. |
| Software Dependencies | No | The paper mentions software like Adam W, POT, and scikit-learn, but does not specify their version numbers. |
| Experiment Setup | Yes | We use a 4-layer neural network as the score matching model, with Re LU nonlinearity and skip-connection at the final output. Each layer is composed of a linear layer with 64 hidden neurons and an embedding layer for 10 timesteps. For optimizer, we use Adam W [22] with learning rate = 0.001 and weight decay coefficient 0.01. For loss function, we use JDSM with λ(t) = g(t)2 and batch size = 128. |