reproducibilityindex.ai

Generative Modeling by Estimating Gradients of the Data Distribution

Authors: Yang Song, Stefano Ermon

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our models produce samples comparable to GANs on MNIST, Celeb A and CIFAR-10 datasets, achieving a new state-of-the-art inception score of 8.87 on CIFAR-10. Additionally, we demonstrate that our models learn effective representations via image inpainting experiments. Experimentally, we demonstrate the efﬁcacy of our approach on MNIST, Celeb A [34], and CIFAR-10 [31]. We show that the samples look comparable to those generated from modern likelihood-based models and GANs. On CIFAR-10, our model sets the new state-of-the-art inception score of 8.87 for unconditional generative models, and achieves a competitive FID score of 25.32. We show that the model learns meaningful representations of the data by image inpainting experiments. For quantitative evaluation, we report inception [48] and FID [20] scores on CIFAR-10 in Tab. 1.
Researcher Affiliation	Academia	Yang Song Stanford University yangsong@cs.stanford.edu Stefano Ermon Stanford University ermon@cs.stanford.edu
Pseudocode	Yes	Algorithm 1 Annealed Langevin dynamics. Require: {σi}L i=1, ϵ, T. 1: Initialize x0 2: for i 1 to L do 3: αi ϵ σ2 i /σ2 L αi is the step size. 4: for t 1 to T do 5: Draw zt N(0, I) 6: xt xt 1 + αi 2 sθ( xt 1, σi) + αi zt 7: end for 8: x0 x T 9: end for
Open Source Code	No	The paper does not provide any explicit statements about releasing source code or links to a code repository for the described methodology.
Open Datasets	Yes	We use MNIST, Celeb A [34], and CIFAR-10 [31] datasets in our experiments. For Celeb A, the images are ﬁrst center-cropped to 140 × 140 and then resized to 32 × 32. All images are rescaled so that pixel values are in [0, 1].
Dataset Splits	No	The paper mentions using MNIST, Celeb A, and CIFAR-10 datasets but does not explicitly provide details about the training, validation, or test data splits (e.g., percentages or sample counts).
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, or memory) used for running the experiments.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies used in the experiments.
Experiment Setup	Yes	We choose L = 10 different standard deviations such that {σi}L i=1 is a geometric sequence with σ1 = 1 and σ10 = 0.01. When using annealed Langevin dynamics for image generation, we choose T = 100 and ϵ = 2 × 10−5, and use uniform noise as our initial samples. We found the results are robust w.r.t. the choice of T, and ϵ between 5 × 10−6 and 5 × 10−5 generally works ﬁne. In the experiments, our model sθ(x, σ) combines the architecture design of U-Net [46] with dilated/atrous convolution [64, 65, 8] both of which have been proved very successful in semantic segmentation. In addition, we adopt instance normalization in our score network, inspired by its superior performance in some image generation tasks [57, 13, 23], and we use a modiﬁed version of conditional instance normalization [13] to provide conditioning on σi.