Pseudo-Spherical Contrastive Divergence

Authors: Lantao Yu, Jiaming Song, Yang Song, Stefano Ermon

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we demonstrate the effectiveness of PS-CD on several 1-D and 2-D synthetic datasets as well as commonly used image datasets.
Researcher Affiliation Academia Lantao Yu Computer Science Department Stanford University lantaoyu@cs.stanford.edu Jiaming Song Computer Science Department Stanford University tsong@cs.stanford.edu Yang Song Computer Science Department Stanford University yangsong@cs.stanford.edu Stefano Ermon Computer Science Department Stanford University ermon@cs.stanford.edu
Pseudocode Yes Algorithm 1 Pseudo-Spherical Contrastive Divergence. 1: Input: Empirical data distribution pdata. Pseudo-spherical scoring rule hyperparameter γ. 2: Initialize energy function Eθ. 3: repeat 4: Draw a minibatch of samples {x+ 1 , . . . , x+ N} from pdata. 5: Draw a minibatch of samples {x 1 , . . . , x N} from qθ exp( Eθ) (e.g., using Langevin dynamics with a sample replay buffer). 6: Update the energy function by stochastic gradient descent: \theta\hat{L}_N^{\gamma} (\theta; p) = \theta \frac{1}{N} \sum_{i=1}^N \exp(\gamma E_{\theta}(x^+_i)) \left( \frac{\sum_{i=1}^N \exp(-\gamma E_{\theta}(x^-_i)) \nabla_{\theta} E_{\theta}(x^-_i)}{\sum_{i=1}^N \exp(-\gamma E_{\theta}(x^-_i))} \right) 7: until Convergence
Open Source Code Yes In Appendix A, we also provide a simple Py Torch implementation for stochastic gradient descent (SGD) with the gradient estimator in Equation (19).
Open Datasets Yes To test the practical usefulness, we use MNIST [54], CIFAR-10 [48] and Celeb A [57] in our experiments for modeling natural images.
Dataset Splits Yes For quantitative evaluation of the 2-D synthetic data experiments, we follow [79] and report the maximum mean discrepancy (MMD, [5]) between the generated samples and validation samples in Table 3 in App. D.1, which demonstrates that PS-CD outperforms its CD counterpart on all but the Funnel dataset. We conduct similar experiments on MNIST and CIFAR-10 datasets, where we use uniform noise as the contamination distribution and the contamination ratio is 0.1 (i.e. 10% images in the training set are replaced with random noise).
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions using 'PyTorch' for implementation but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup Yes More experimental details about the data processing, model architectures, sampling strategies and additional experimental results can be found in App. D. For Celeb A, we use a simple 5-layer CNN architecture. For all experiments, we use Langevin dynamics with K=100 MCMC steps to sample from EBMs. We use Adam optimizer with learning rate 1e-4 and batch size 64.