Amortized Bethe Free Energy Minimization for Learning MRFs
Authors: Sam Wiseman, Yoon Kim
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimentally, we find that the proposed approach compares favorably with loopy belief propagation, but is faster, and it allows for attaining better held out log likelihood than other recent approximate inference schemes. In Table 1 we show the correlation and the mean L1 distance between the true vs. approximated marginals for the various methods. In Table 2 we show results from learning the generative model alongside the inference network. Table 3 reports the held out average NLL of learned RBMs, as estimated by AIS [46]. |
| Researcher Affiliation | Academia | Sam Wiseman Toyota Technological Institute at Chicago Chicago, IL, USA swiseman@ttic.edu Yoon Kim Harvard University Cambridge, MA, USA yoonkim@seas.harvard.edu |
| Pseudocode | Yes | Algorithm 1 Saddle-point MRF Learning |
| Open Source Code | Yes | code for duplicating experiments is available at https://github.com/swiseman/bethe-min. |
| Open Datasets | Yes | We train RBMs with 100 hidden units on the UCI digits dataset [1]... We consider learning a K = 30 state 3rd order directed neural HMM on sentences from the Penn Treebank [32]. |
| Dataset Splits | Yes | For a randomly generated Ising model, we obtain 1000 samples each for train, validation, and test sets... We used a batch size of 32, and selected hyperparameters through random search, monitoring validation expected pseudo-likelihood [3] for all models; see the Supplementary Material. ...on sentences from the Penn Treebank [32] (using the standard splits and preprocessing by Mikolov et al. [35]) |
| Hardware Specification | Yes | speed results were measured on the same 1080 Ti GPU |
| Software Dependencies | No | The paper does not mention specific version numbers for software dependencies. It implies the use of common deep learning frameworks but without the required version details. |
| Experiment Setup | Yes | We used a batch size of 32, and selected hyperparameters through random search... We train inference networks f and fx to output pseudo-marginals τ and τ x as in Algorithm 1, using I1 = 1 and I2 = 1 gradient updates per minibatch. |