To Relieve Your Headache of Training an MRF, Take AdVIL

Authors: Chongxuan Li, Chao Du, Kun Xu, Max Welling, Jun Zhu, Bo Zhang

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate Ad VIL in various undirected generative models, including restricted Boltzmann machines (RBM) (Ackley et al., 1985), deep Boltzmann machines (DBM) (Salakhutdinov & Hinton, 2009), and Gaussian restricted Boltzmann machines (GRBM) (Hinton & Salakhutdinov, 2006), on several real datasets. We empirically demonstrate that (1) compared to the black-box NVIL (Kuleshov & Ermon, 2017) method, Ad VIL provides a tighter estimate of the log partition function and achieves much better log-likelihood results; and (2) compared to contrastive divergence based methods (Hinton, 2002; Welling & Sutton, 2005), Ad VIL can deal with a broader family of MRFs without model-specific analysis and obtain better results when the model structure gets complex as in DBM.
Researcher Affiliation Academia Dept. of Comp. Sci. & Tech., BNRist Center, Institute for AI, THBI Lab, Tsinghua University, Beijing, 100084, China University of Amsterdam, and the Canadian Institute for Advanced Research (CIFAR).
Pseudocode Yes Algorithm 1 Adversarial variational inference and learning by stochastic gradient descent
Open Source Code Yes See the source code in https://anonymous.4open.science/r/8c779fbc-6394-40c7-8273-e52504814703/.
Open Datasets Yes We evaluate our method on the Digits dataset4, the UCI binary databases (Dheeru & Karra, 2017) and the Frey faces datasets5. The information of the datasets is summarized in Tab. 3. (...) 4https://scikit-learn.org/stable/modules/generated/ sklearn.datasets.load digits.html#sklearn.datasets.load digits 5http://www.cs.nyu.edu/ roweis/data.html
Dataset Splits Yes Table 3: Dimensions of the visible variables and sizes of the train, validation and test splits. (...) Digits 64 1438 359 Adult 123 5,000 1414 26147
Hardware Specification No The paper mentions "GPU/DGX Acceleration" in the acknowledgements but does not specify any particular GPU or CPU models, memory, or other hardware components used for the experiments.
Software Dependencies No We implement our model using the Tensor Flow (Abadi et al., 2016) library. In all experiments, q and r are updated 100 times per update of P and Q, i.e. K1 = 100 and K2 = 1. We use the ADAM (Kingma & Ba, 2014) optimizer with the learning rate α = 0.0003, the moving average ratios β1 = 0.5 and β2 = 0.999, and the batch size of 500. The paper mentions TensorFlow and ADAM, but it does not specify version numbers for any software dependencies.
Experiment Setup Yes In all experiments, q and r are updated 100 times per update of P and Q, i.e. K1 = 100 and K2 = 1. We use the ADAM (Kingma & Ba, 2014) optimizer with the learning rate α = 0.0003, the moving average ratios β1 = 0.5 and β2 = 0.999, and the batch size of 500. We use a continuous z and the sigmoid activation function . All these hyperparameters are set according to the validation performance of an RBM on the Digits dataset and fixed throughout the paper unless otherwise stated.