reproducibilityindex.ai

Pseudo-Spherical Contrastive Divergence

Authors: Lantao Yu, Jiaming Song, Yang Song, Stefano Ermon

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we demonstrate the effectiveness of PS-CD on several 1-D and 2-D synthetic datasets as well as commonly used image datasets.
Researcher Affiliation	Academia	Lantao Yu Computer Science Department Stanford University lantaoyu@cs.stanford.edu Jiaming Song Computer Science Department Stanford University tsong@cs.stanford.edu Yang Song Computer Science Department Stanford University yangsong@cs.stanford.edu Stefano Ermon Computer Science Department Stanford University ermon@cs.stanford.edu
Pseudocode	Yes	Algorithm 1 Pseudo-Spherical Contrastive Divergence. 1: Input: Empirical data distribution pdata. Pseudo-spherical scoring rule hyperparameter γ. 2: Initialize energy function Eθ. 3: repeat 4: Draw a minibatch of samples {x+ 1 , . . . , x+ N} from pdata. 5: Draw a minibatch of samples {x 1 , . . . , x N} from qθ exp( Eθ) (e.g., using Langevin dynamics with a sample replay buffer). 6: Update the energy function by stochastic gradient descent: \theta\hat{L}_N^{\gamma} (\theta; p) = \theta \frac{1}{N} \sum_{i=1}^N \exp(\gamma E_{\theta}(x^+_i)) \left( \frac{\sum_{i=1}^N \exp(-\gamma E_{\theta}(x^-_i)) \nabla_{\theta} E_{\theta}(x^-_i)}{\sum_{i=1}^N \exp(-\gamma E_{\theta}(x^-_i))} \right) 7: until Convergence
Open Source Code	Yes	In Appendix A, we also provide a simple Py Torch implementation for stochastic gradient descent (SGD) with the gradient estimator in Equation (19).
Open Datasets	Yes	To test the practical usefulness, we use MNIST [54], CIFAR-10 [48] and Celeb A [57] in our experiments for modeling natural images.
Dataset Splits	Yes	For quantitative evaluation of the 2-D synthetic data experiments, we follow [79] and report the maximum mean discrepancy (MMD, [5]) between the generated samples and validation samples in Table 3 in App. D.1, which demonstrates that PS-CD outperforms its CD counterpart on all but the Funnel dataset. We conduct similar experiments on MNIST and CIFAR-10 datasets, where we use uniform noise as the contamination distribution and the contamination ratio is 0.1 (i.e. 10% images in the training set are replaced with random noise).
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions using 'PyTorch' for implementation but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup	Yes	More experimental details about the data processing, model architectures, sampling strategies and additional experimental results can be found in App. D. For Celeb A, we use a simple 5-layer CNN architecture. For all experiments, we use Langevin dynamics with K=100 MCMC steps to sample from EBMs. We use Adam optimizer with learning rate 1e-4 and batch size 64.