Unbiased Contrastive Divergence Algorithm for Training Energy-Based Latent Variable Models

Authors: Yixuan Qiu, Lingsong Zhang, Xiao Wang

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Rigorous theoretical analysis is developed to justify the proposed algorithm, and numerical experiments show that it significantly improves the existing method.
Researcher Affiliation Academia Yixuan Qiu Department of Statistics and Data Science Carnegie Mellon University Pittsburgh, PA 15213, USA yixuanq@andrew.cmu.edu Lingsong Zhang & Xiao Wang Department of Statistics Purdue University West Lafayette, IN 47907, USA {lingsong, wangxiao}@purdue.edu
Pseudocode Yes Algorithm 1 Coupling method for the Gibbs sampler Algorithm 2 UCD Algorithm for estimating θ Algorithm 3 Coupling method for RBM
Open Source Code Yes The implementation of the UCD algorithm is available at https://github.com/yixuan/cdtau.
Open Datasets Yes Next we consider the Fashion-MNIST data set2, a replacement for the well-known but overused MNIST data set of handwritten digits (Le Cun et al., 1990).
Dataset Splits No The paper does not explicitly provide training/validation/test dataset splits (e.g., percentages, sample counts, or specific split files). It mentions using datasets for training but not the methodology for splitting them.
Hardware Specification Yes All experiments in this article were run on an Intel Xeon Gold 6126 processor with 12 cores and 24 threads.
Software Dependencies No The paper mentions 'Open BLAS library' and 'Open MP' but does not provide specific version numbers for these software components, which is necessary for a reproducible description of ancillary software.
Experiment Setup Yes In our study, k is set to 1 for CD (more experiments with larger k are given in Appendix B.1), and each algorithm is run for 100 times, accounting for the randomness in the training process. A common learning rate α = 0.01 is set, and 1000 parallel Markov chains are used to approximate the gradient in each iteration. (for BAS data) We use a common learning rate α = 0.2 and 1000 Markov chains in each iteration for all three algorithms. (for Simulated RBM data) ...train the model with different algorithms using a mini-batch size of 1000 and a learning rate α = 0.1. For each training algorithm, 1000 parallel Markov chains are used to compute the gradient. (for Fashion-MNIST data)