Unbiased Contrastive Divergence Algorithm for Training Energy-Based Latent Variable Models
Authors: Yixuan Qiu, Lingsong Zhang, Xiao Wang
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Rigorous theoretical analysis is developed to justify the proposed algorithm, and numerical experiments show that it significantly improves the existing method. |
| Researcher Affiliation | Academia | Yixuan Qiu Department of Statistics and Data Science Carnegie Mellon University Pittsburgh, PA 15213, USA yixuanq@andrew.cmu.edu Lingsong Zhang & Xiao Wang Department of Statistics Purdue University West Lafayette, IN 47907, USA {lingsong, wangxiao}@purdue.edu |
| Pseudocode | Yes | Algorithm 1 Coupling method for the Gibbs sampler Algorithm 2 UCD Algorithm for estimating θ Algorithm 3 Coupling method for RBM |
| Open Source Code | Yes | The implementation of the UCD algorithm is available at https://github.com/yixuan/cdtau. |
| Open Datasets | Yes | Next we consider the Fashion-MNIST data set2, a replacement for the well-known but overused MNIST data set of handwritten digits (Le Cun et al., 1990). |
| Dataset Splits | No | The paper does not explicitly provide training/validation/test dataset splits (e.g., percentages, sample counts, or specific split files). It mentions using datasets for training but not the methodology for splitting them. |
| Hardware Specification | Yes | All experiments in this article were run on an Intel Xeon Gold 6126 processor with 12 cores and 24 threads. |
| Software Dependencies | No | The paper mentions 'Open BLAS library' and 'Open MP' but does not provide specific version numbers for these software components, which is necessary for a reproducible description of ancillary software. |
| Experiment Setup | Yes | In our study, k is set to 1 for CD (more experiments with larger k are given in Appendix B.1), and each algorithm is run for 100 times, accounting for the randomness in the training process. A common learning rate α = 0.01 is set, and 1000 parallel Markov chains are used to approximate the gradient in each iteration. (for BAS data) We use a common learning rate α = 0.2 and 1000 Markov chains in each iteration for all three algorithms. (for Simulated RBM data) ...train the model with different algorithms using a mini-batch size of 1000 and a learning rate α = 0.1. For each training algorithm, 1000 parallel Markov chains are used to compute the gradient. (for Fashion-MNIST data) |