Amortized Bethe Free Energy Minimization for Learning MRFs

Authors: Sam Wiseman, Yoon Kim

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimentally, we find that the proposed approach compares favorably with loopy belief propagation, but is faster, and it allows for attaining better held out log likelihood than other recent approximate inference schemes. In Table 1 we show the correlation and the mean L1 distance between the true vs. approximated marginals for the various methods. In Table 2 we show results from learning the generative model alongside the inference network. Table 3 reports the held out average NLL of learned RBMs, as estimated by AIS [46].
Researcher Affiliation Academia Sam Wiseman Toyota Technological Institute at Chicago Chicago, IL, USA swiseman@ttic.edu Yoon Kim Harvard University Cambridge, MA, USA yoonkim@seas.harvard.edu
Pseudocode Yes Algorithm 1 Saddle-point MRF Learning
Open Source Code Yes code for duplicating experiments is available at https://github.com/swiseman/bethe-min.
Open Datasets Yes We train RBMs with 100 hidden units on the UCI digits dataset [1]... We consider learning a K = 30 state 3rd order directed neural HMM on sentences from the Penn Treebank [32].
Dataset Splits Yes For a randomly generated Ising model, we obtain 1000 samples each for train, validation, and test sets... We used a batch size of 32, and selected hyperparameters through random search, monitoring validation expected pseudo-likelihood [3] for all models; see the Supplementary Material. ...on sentences from the Penn Treebank [32] (using the standard splits and preprocessing by Mikolov et al. [35])
Hardware Specification Yes speed results were measured on the same 1080 Ti GPU
Software Dependencies No The paper does not mention specific version numbers for software dependencies. It implies the use of common deep learning frameworks but without the required version details.
Experiment Setup Yes We used a batch size of 32, and selected hyperparameters through random search... We train inference networks f and fx to output pseudo-marginals τ and τ x as in Algorithm 1, using I1 = 1 and I2 = 1 gradient updates per minibatch.