Belief Propagation Neural Networks
Authors: Jonathan Kuck, Shuvam Chakraborty, Hao Tang, Rachel Luo, Jiaming Song, Ashish Sabharwal, Stefano Ermon
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We trained BPNN to estimate the partition function of factor graphs from a variety of domains. First, experiments on synthetic Ising models show that BPNN-D can learn to find better fixed points than BP and converge faster. Additionally, BPNN generalizes to Ising models with nearly twice as many variables as seen during training and that were sampled from a different distribution. Second, experiments and an ablation study on the stochastic block model from community detection show that maintaining properties of BP in BPNN improves results over standard GNNs. Finally, model counting experiments performed on real world SAT problems show that BPNN can learn from 10 s of training problems, generalize to problems that are harder for an exact model counter, and compute estimates 100 s of times faster than handcrafted approximate model counters. |
| Researcher Affiliation | Collaboration | Jonathan Kuck1, Shuvam Chakraborty1, Hao Tang2, Rachel Luo1, Jiaming Song1, Ashish Sabharwal3, and Stefano Ermon1 1Stanford University 2Shanghai Jiao Tong University 3Allen Institute for Artificial Intelligence {kuck,shuvamc,rsluo,tsong,ermon}@stanford.edu, silent56@sjtu.edu.cn, ashishs@allenai.org |
| Pseudocode | No | The paper describes the mathematical formulations of the algorithms (e.g., equations 1-5) and flowcharts (Figure 1), but does not present structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions implementation details ('We implemented our BPNN and the baseline GNN using Py Torch Geometric [19]') but does not include an explicit statement about releasing their code or a link to a repository. |
| Open Datasets | Yes | We evaluated the performance of our BPNN using benchmarks from [44], with ground truth model counts obtained using DSharp [37]. The benchmarks fall into 7 categories, including network QMR problems (Quick Medical Reference) [26], network grid problems, and bit-blasted versions of satisfiability modulo theories library (SMTLIB) benchmarks [12]. |
| Dataset Splits | Yes | We trained an iterative BPNN-D layer to lower bound the partition function on a training set of 50 random Ising models of size 10x10 (100 variables). ... We then ran the learned BPNN-D and standard BP on a validation set of 50 Ising models. AND For training, we sample 10 two class SBMs with 15 nodes... along with four such graphs for validation. ... producing 300 training and 120 validation graphs. AND We trained a separate BPNN on a random sampling of 70% of the problems in each training category. ... Validation was performed on the remaining 30% of problems... |
| Hardware Specification | No | The paper mentions 'GPU computation' in the context of speedups (e.g., 'When BPNN is allowed to run in parallel on a GPU, it provides median speedups...'), but does not specify any exact GPU models, CPU types, or other detailed hardware specifications. |
| Software Dependencies | No | The paper mentions software like 'Py Torch Geometric' [19], 'lib DAI' [35], 'DSharp' [37], 'Approx MC3' [12, 44] and 'F2' [4, 5], but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | In all our experiments, we initialized the BPNN to output the Bethe approximation obtained by running BP for a specified number of iterations. We used the mean squared error between the BPNN prediction and the ground truth log partition function as our training loss. BPNN-D had a median improvement ratio of 1.7x over BP, please refer to the appendix for complete convergence plots. Among the 44 models where BP converged, the RMSE between the exact log partition function and BPNN-D s estimate was .97 compared with 7.20 for BP. ... For this experiment we used a BPNN architecture with 10 iterative layers whose weights were not tied and with MLPs that operate on factor messages (without a BPNN-B layer). As a baseline we trained a 10 layer GNN (maximally powerful GIN architecture) with width 4 on the same dataset. ... with damping set to .5, and (2) run for a maximum of 1000 iterations with sequential updates using a random sequence and no damping. We trained a BPNN with 30 iterative BPNN layers that operate on messages (see Appendix C), followed by a BPNN-B layer. |