Pseudo-Spherical Contrastive Divergence
Authors: Lantao Yu, Jiaming Song, Yang Song, Stefano Ermon
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we demonstrate the effectiveness of PS-CD on several 1-D and 2-D synthetic datasets as well as commonly used image datasets. |
| Researcher Affiliation | Academia | Lantao Yu Computer Science Department Stanford University lantaoyu@cs.stanford.edu Jiaming Song Computer Science Department Stanford University tsong@cs.stanford.edu Yang Song Computer Science Department Stanford University yangsong@cs.stanford.edu Stefano Ermon Computer Science Department Stanford University ermon@cs.stanford.edu |
| Pseudocode | Yes | Algorithm 1 Pseudo-Spherical Contrastive Divergence. 1: Input: Empirical data distribution pdata. Pseudo-spherical scoring rule hyperparameter γ. 2: Initialize energy function Eθ. 3: repeat 4: Draw a minibatch of samples {x+ 1 , . . . , x+ N} from pdata. 5: Draw a minibatch of samples {x 1 , . . . , x N} from qθ exp( Eθ) (e.g., using Langevin dynamics with a sample replay buffer). 6: Update the energy function by stochastic gradient descent: \theta\hat{L}_N^{\gamma} (\theta; p) = \theta \frac{1}{N} \sum_{i=1}^N \exp(\gamma E_{\theta}(x^+_i)) \left( \frac{\sum_{i=1}^N \exp(-\gamma E_{\theta}(x^-_i)) \nabla_{\theta} E_{\theta}(x^-_i)}{\sum_{i=1}^N \exp(-\gamma E_{\theta}(x^-_i))} \right) 7: until Convergence |
| Open Source Code | Yes | In Appendix A, we also provide a simple Py Torch implementation for stochastic gradient descent (SGD) with the gradient estimator in Equation (19). |
| Open Datasets | Yes | To test the practical usefulness, we use MNIST [54], CIFAR-10 [48] and Celeb A [57] in our experiments for modeling natural images. |
| Dataset Splits | Yes | For quantitative evaluation of the 2-D synthetic data experiments, we follow [79] and report the maximum mean discrepancy (MMD, [5]) between the generated samples and validation samples in Table 3 in App. D.1, which demonstrates that PS-CD outperforms its CD counterpart on all but the Funnel dataset. We conduct similar experiments on MNIST and CIFAR-10 datasets, where we use uniform noise as the contamination distribution and the contamination ratio is 0.1 (i.e. 10% images in the training set are replaced with random noise). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'PyTorch' for implementation but does not provide specific version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | More experimental details about the data processing, model architectures, sampling strategies and additional experimental results can be found in App. D. For Celeb A, we use a simple 5-layer CNN architecture. For all experiments, we use Langevin dynamics with K=100 MCMC steps to sample from EBMs. We use Adam optimizer with learning rate 1e-4 and batch size 64. |