GumBolt: Extending Gumbel trick to Boltzmann priors
Authors: Amir H. Khoshaman, Mohammad Amin
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Gum Bolt achieves state-of-the-art performance on permutation invariant MNIST and OMNIGLOT datasets... The contributions of this work are as follows: ... Gum Bolt considerably outperforms the previous works in a wide series of experiments on permutation invariant MNIST and OMNIGLOT datasets... 5 Experiments In order to explore the effectiveness of the Gum Bolt, we present the results of a wide set of experiments conducted on standard feed-forward structures... We compare the models on statically binarized MNIST... and OMNIGLOT datasets... The 4000-sample estimation of log-likelihood... of the models are reported in Table 1. |
| Researcher Affiliation | Collaboration | Amir H. Khoshaman D-Wave Systems Inc. khoshaman@gmail.com Mohammad H. Amin D-Wave Systems Inc. Simon Fraser University mhsamin@dwavesys.com Currently at Borealis AI |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not contain any statement about releasing source code for the described methodology or provide a link to a code repository. |
| Open Datasets | Yes | We compare the models on statically binarized MNIST (Salakhutdinov and Murray, 2008) and OMNIGLOT datasets (Lake et al., 2015) with the usual compartmentalization into the training, validation, and test-sets. |
| Dataset Splits | Yes | We compare the models on statically binarized MNIST (Salakhutdinov and Murray, 2008) and OMNIGLOT datasets (Lake et al., 2015) with the usual compartmentalization into the training, validation, and test-sets. ... The value of temperature, was set to 1/7 for all the experiments involving Gum Bolt, 1/5 for experiments with dVAE, and 1/10 and 1/8 for dVAE++ on the MNIST and OMNIGLOT datasets, respectively; these values were cross-validated from {1/9 . . . , 1/5}. |
| Hardware Specification | No | The paper mentions "a GPU implementation of parallel tempering algorithm" but does not specify any particular GPU model (e.g., NVIDIA A100), CPU, or other specific hardware details used for running experiments. |
| Software Dependencies | No | The paper mentions algorithms like ADAM and Persistent Contrastive Divergence (PCD) and cites relevant papers, but it does not specify software names with version numbers (e.g., Python 3.8, PyTorch 1.9). |
| Experiment Setup | Yes | 1M iterations of parameter updates using the ADAM algorithm (Kingma and Ba, 2014), with the default settings and batch size of 100 were carried out. The initial learning rate is 3e-3 and is subsequently reduced by 0.3 at 60%, 75%, and 95% of the total iterations. KL annealing (Sønderby et al., 2016) was used via a linear schedule during 30% of the total iterations. The value of temperature, was set to 1/7 for all the experiments involving Gum Bolt, 1/5 for experiments with dVAE, and 1/10 and 1/8 for dVAE++ on the MNIST and OMNIGLOT datasets, respectively; these values were cross-validated from {1/9 . . . , 1/5}. ... Sampling the RBM was done by performing 200 steps of Gibbs updates for every mini-batch, in accordance with our baselines,using persistent contrastive divergence (PCD) (Tieleman, 2008). |