Efficient Learning of Generative Models via Finite-Difference Score Matching

Authors: Tianyu Pang, Kun Xu, Chongxuan LI, Yang Song, Stefano Ermon, Jun Zhu

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we demonstrate that our methods produce results comparable to the gradient-based counterparts while being much more computationally efficient. and In this section, we experiment on a diverse set of generative models, following the default settings in previous work [36, 57, 58].
Researcher Affiliation Collaboration Tianyu Pang 1, Kun Xu 1, Chongxuan Li1, Yang Song2, Stefano Ermon2, Jun Zhu 1 1Dept. of Comp. Sci. & Tech., Institute for AI, BNRist Center, Tsinghua-Bosch Joint ML Center, THBI Lab, Tsinghua University 2Department of Computer Science, Stanford University
Pseudocode Yes Specifically, our FD approach only requires independent (unnormalized) likelihood function evaluations, which can be efficiently and synchronously executed in parallel with a simple implementation (pseudo code is in Appendix C.1). and Algorithm 1 Finite Difference Approximation of Directional Derivative (from Appendix C.1).
Open Source Code Yes Our code is provided in https://github.com/taufikxu/FD-Score Matching.
Open Datasets Yes We validate our methods on six datasets including MNIST [33], Fashion-MNIST [69], Celeb A [38], CIFAR-10 [28], SVHN [44], and Image Net [7]. and Following the setting in Song et al. [58], we evaluate on three UCI datasets [2].
Dataset Splits No We report the negative log-likelihood (NLL) and the exact SM loss on the test set (from Table 1 caption). While test sets are mentioned, specific numerical details (percentages or sample counts) for train/validation/test splits are not provided within the paper's text.
Hardware Specification No The function L [theta ](x) is the log-density modeled by a deep EBM and trained on MNIST, while we use Py Torch [47] for automatic differentiation... When we parallelize the FD decomposition, the computing time is almost a constant w.r.t. the order T, as long as there is enough GPU memory. The paper mentions GPU memory but does not specify exact GPU models, CPU models, or other detailed hardware specifications.
Software Dependencies No The function L [theta ](x) is the log-density modeled by a deep EBM and trained on MNIST, while we use Py Torch [47] for automatic differentiation. Only PyTorch is mentioned, but its specific version number is not provided.
Experiment Setup Yes Under each algorithm, we train the DKEF model for 500 epochs with the batch size of 200. (Table 1 caption), trained for 300K iterations with the batch size of 64. (Table 2 caption), We simply set [varepsilon ] = 0.1 to be a constant during training (Section 6.1), and The noise level {sigmai}i [10] is a geometric sequence with sigma1 = 1 and sigma10 = 0.01. When using the annealed Langevin dynamics for image generation, the number of iterations under each noise level is 100 with a uniform noise as the initial sample. We train the models on the CIFAR-10 dataset with the batch size of 128 (Section 6.4).