Partition Functions from Rao-Blackwellized Tempered Sampling
Authors: David Carlson, Patrick Stinson, Ari Pakman, Liam Paninski
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically find that the new method provides more accurate estimates than Annealed Importance Sampling when calculating partition functions of large Restricted Boltzmann Machines (RBM); moreover, the method is sufficiently accurate to track training and validation log-likelihoods during learning of RBMs, at minimal computational cost. |
| Researcher Affiliation | Academia | David E. Carlson 1,2 DAVID.EDWIN.CARLSON@GMAIL.COM Patrick Stinson 2 PATRICKSTINSON@GMAIL.COM Ari Pakman 1,2 ARI@STAT.COLUMBIA.EDU Liam Paninski1,2 LIAM@STAT.COLUMBIA.EDU 1 Department of Statistics 2 Grossman Center for the Statistics of Mind Columbia University, New York, NY, 10027 |
| Pseudocode | Yes | Algorithm 1 Rao-Blackwellized Tempered Sampling Input: {βk, rk}k=1,...,K, N Initialize log ˆZk, k = 2, ..., K Initialize β {β1, ..., βK} Initialize ˆck = 0, k = 1, ..., K for i = 1 to N do Transition in x leaving q(x|β) invariant. Sample β|x (β|x) Update ˆck ˆck + 1 N q(βk|x) end for Update ˆZRTS k ˆZk r1ˆck rkˆc1 , k = 2, ..., K |
| Open Source Code | No | The paper mentions external code for comparison (pymbar) and code for RBM/AIS, but does not provide a link or explicit statement for the open-source code of their proposed RTS method. |
| Open Datasets | Yes | Figure 3 shows a comparison of RTS versus AIS/RAISE on two RBMs trained on the binarized MNIST dataset (M=784, N=60000), with 500 and 100 hidden units. The former was taken from (Salakhutdinov & Murray, 2008),2 while the latter was trained with the method of (Carlson et al., 2015b). ... This idea is illustrated in Figure 5, which shows estimates of the mean of training and validation log-likelihoods on the dna dataset3, with 180 observed binary features, trained on a RBM with 500 hidden units. 3Available from: https://www.csie.ntu.edu.tw/ cjlin/libsvmtools/datasets/multiclass.html |
| Dataset Splits | No | The paper mentions 'training and validation log-likelihoods' but does not provide specific percentages or counts for these splits. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used for running its experiments, such as specific CPU or GPU models. |
| Software Dependencies | No | The paper mentions the 'pymbar package' but does not provide specific version numbers for any software dependencies needed to replicate the experiment. |
| Experiment Setup | Yes | In RTS, the number of inverse temperatures was fixed at K=100, and we performed 10 initial iterations of 50 Gibbs sweeps each, following Section 2.4. ... For the main training effort we used the RMSspectral stochastic gradient method, with stepsize of 1e-5 and parameter λ = .99 ... We considered a tempered space with K = 100 and sampled 25 Gibbs sweeps on 2000 parallel chains between gradient updates. ... We used a prior on the inverse temperatures rk exp(2βk)... To smooth the noise from such a small number of samples, we consider partial updates of ˆZK given by ˆZ(t+1) K = ˆZ(t) K r1 r K ˆc(t) K ˆc(t) 1 with α = 0.2... |