Maximum Likelihood Training of Score-Based Diffusion Models
Authors: Yang Song, Conor Durkan, Iain Murray, Stefano Ermon
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically observe that maximum likelihood training consistently improves the likelihood of score-based diffusion models across multiple datasets, stochastic processes, and model architectures. We empirically test the performance of likelihood weighting, importance sampling and variational dequantization across multiple architectures of score-based models, SDEs, and datasets. In particular, we consider DDPM++ ( Baseline in Table 2) and DDPM++ (deep) ( Deep in Table 2) models with VP and sub VP SDEs [48] on CIFAR-10 [28] and Image Net 32ˆ32 [55] datasets. We omit experiments on the VE SDE since (i) under this SDE our likelihood weighting is the same as the original weighting in [48]; (ii) we empirically observe that the best VE SDE model achieves around 3.4 bits/dim on CIFAR-10 in our experiments, which is significantly worse than other SDEs. For each experiment, we report Erlog p ODE θ pxqs ( Negative log-likelihood in Table 2), and the upper bound Er LDSM θ pxqs ( Bound in Table 2). In addition, we report FID scores [17] for samples from p ODE θ , produced by solving the corresponding ODE with the Dormand Prince RK45 [14] solver. |
| Researcher Affiliation | Academia | Yang Song Computer Science Department Stanford University yangsong@cs.stanford.edu Conor Durkan School of Informatics University of Edinburgh conor.durkan@ed.ac.uk Iain Murray School of Informatics University of Edinburgh i.murray@ed.ac.uk Stefano Ermon Computer Science Department Stanford University ermon@cs.stanford.edu |
| Pseudocode | No | The paper describes methods and processes through text and mathematical equations, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Code is released at https://github.com/yang-song/score_flow. |
| Open Datasets | Yes | We empirically test the performance of likelihood weighting, importance sampling and variational dequantization across multiple architectures of score-based models, SDEs, and datasets. In particular, we consider DDPM++ ( Baseline in Table 2) and DDPM++ (deep) ( Deep in Table 2) models with VP and sub VP SDEs [48] on CIFAR-10 [28] and Image Net 32ˆ32 [55] datasets. |
| Dataset Splits | No | The paper mentions using CIFAR-10 and ImageNet 32x32 datasets and training steps/batch sizes, but it does not explicitly provide information on how the data was split into training, validation, and test sets (e.g., percentages or specific sample counts). |
| Hardware Specification | Yes | All models were trained on Google Cloud TPUs (v3-8 and v3-32). |
| Software Dependencies | No | The paper describes training details, but does not explicitly list specific software dependencies with version numbers (e.g., 'PyTorch 1.x', 'Python 3.x', or 'CUDA 11.x'). |
| Experiment Setup | Yes | For CIFAR-10 experiments, we train our models for 250,000 optimization steps, with a batch size of 256. For ImageNet 32x32 experiments, we train our models for 500,000 optimization steps, with a batch size of 256. For both datasets, we use the Adam optimizer [29] with β1 = 0.9, β2 = 0.999 and ϵ = 1e−8. The learning rate starts at 2e−4 and decays linearly to 1e−5 after 100,000 optimization steps. |