To Relieve Your Headache of Training an MRF, Take AdVIL
Authors: Chongxuan Li, Chao Du, Kun Xu, Max Welling, Jun Zhu, Bo Zhang
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Ad VIL in various undirected generative models, including restricted Boltzmann machines (RBM) (Ackley et al., 1985), deep Boltzmann machines (DBM) (Salakhutdinov & Hinton, 2009), and Gaussian restricted Boltzmann machines (GRBM) (Hinton & Salakhutdinov, 2006), on several real datasets. We empirically demonstrate that (1) compared to the black-box NVIL (Kuleshov & Ermon, 2017) method, Ad VIL provides a tighter estimate of the log partition function and achieves much better log-likelihood results; and (2) compared to contrastive divergence based methods (Hinton, 2002; Welling & Sutton, 2005), Ad VIL can deal with a broader family of MRFs without model-specific analysis and obtain better results when the model structure gets complex as in DBM. |
| Researcher Affiliation | Academia | Dept. of Comp. Sci. & Tech., BNRist Center, Institute for AI, THBI Lab, Tsinghua University, Beijing, 100084, China University of Amsterdam, and the Canadian Institute for Advanced Research (CIFAR). |
| Pseudocode | Yes | Algorithm 1 Adversarial variational inference and learning by stochastic gradient descent |
| Open Source Code | Yes | See the source code in https://anonymous.4open.science/r/8c779fbc-6394-40c7-8273-e52504814703/. |
| Open Datasets | Yes | We evaluate our method on the Digits dataset4, the UCI binary databases (Dheeru & Karra, 2017) and the Frey faces datasets5. The information of the datasets is summarized in Tab. 3. (...) 4https://scikit-learn.org/stable/modules/generated/ sklearn.datasets.load digits.html#sklearn.datasets.load digits 5http://www.cs.nyu.edu/ roweis/data.html |
| Dataset Splits | Yes | Table 3: Dimensions of the visible variables and sizes of the train, validation and test splits. (...) Digits 64 1438 359 Adult 123 5,000 1414 26147 |
| Hardware Specification | No | The paper mentions "GPU/DGX Acceleration" in the acknowledgements but does not specify any particular GPU or CPU models, memory, or other hardware components used for the experiments. |
| Software Dependencies | No | We implement our model using the Tensor Flow (Abadi et al., 2016) library. In all experiments, q and r are updated 100 times per update of P and Q, i.e. K1 = 100 and K2 = 1. We use the ADAM (Kingma & Ba, 2014) optimizer with the learning rate α = 0.0003, the moving average ratios β1 = 0.5 and β2 = 0.999, and the batch size of 500. The paper mentions TensorFlow and ADAM, but it does not specify version numbers for any software dependencies. |
| Experiment Setup | Yes | In all experiments, q and r are updated 100 times per update of P and Q, i.e. K1 = 100 and K2 = 1. We use the ADAM (Kingma & Ba, 2014) optimizer with the learning rate α = 0.0003, the moving average ratios β1 = 0.5 and β2 = 0.999, and the batch size of 500. We use a continuous z and the sigmoid activation function . All these hyperparameters are set according to the validation performance of an RBM on the Digits dataset and fixed throughout the paper unless otherwise stated. |