Bayesian Learning of Sum-Product Networks

Authors: Martin Trapp, Robert Peharz, Hong Ge, Franz Pernkopf, Zoubin Ghahramani

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In various experiments, our Bayesian SPNs often improve test likelihoods over greedy SPN learners. Further, since the Bayesian framework protects against overfitting, we can evaluate hyper-parameters directly on the Bayesian model score, waiving the need for a separate validation set, which is especially beneficial in low data regimes. Bayesian SPNs can be applied to heterogeneous domains and can easily be extended to nonparametric formulations. Moreover, our Bayesian approach is the first, which consistently and robustly learns SPN structures under missing data. We assessed the performance of our approach on discrete [10] and heterogeneous data [43] as well as on three datasets with missing values. We constructed G using the algorithm described in the supplement and used a grid search over the parameters of the graph. Further, we used 5 103 burn-in steps and estimated the predictive distribution using 104 samples from the posterior.
Researcher Affiliation Collaboration Martin Trapp1,2, Robert Peharz3, Hong Ge3, Franz Pernkopf1, Zoubin Ghahramani4,3 1Graz University of Technology, 2OFAI, 3University of Cambridge, 4Uber AI martin.trapp@tugraz.at, rp587@cam.ac.uk, hg344@cam.ac.uk pernkopf@tugraz.at, zoubin@eng.cam.ac.uk
Pseudocode No The paper mentions an "algorithm to construct region-graphs used in this paper" and refers to the supplement for its detailed description, but it does not provide any pseudocode or algorithm blocks directly within the main text.
Open Source Code Yes See https://github.com/trappmartin/Bayesian Sum Product Networks for and implementation of Bayesian SPNs in form of a Julia package accompanied by codes and datasets used for the experiments.
Open Datasets Yes We assessed the performance of our approach on discrete [10] and heterogeneous data [43] as well as on three datasets with missing values. We list details on the selected parameters and the runtime for each dataset in the supplement, c.f. Table 3. Table 1 lists the test log-likelihood scores of state-of-the-art (SOTA) structure learners, i.e. Learn SPN [10], Learn SPN with parameter optimisation (CCCP) [46] and ID-SPN [35], random region-graphs (RAT-SPN) [27] and the results obtained using Bayesian SPNs (ours) and infinite mixtures of Bayesian SPN (ours ) on discrete datasets. Dataset Learn SPN RAT-SPN CCCP ID-SPN ours ours BTD NLTCS 6.11 6.01 6.03 6.02 6.00 6.02 5.97 MSNBC 6.11 6.04 6.05 6.04 6.06 6.03 6.03 KDD 2.18 2.13 2.13 2.13 2.12 2.13 2.11 Plants 12.98 13.44 12.87 12.54 12.68 12.94 11.84 Audio 40.50 39.96 40.02 39.79 39.77 39.79 39.39 Jester 53.48 52.97 52.88 52.86 52.42 52.86 51.29 Netflix 57.33 56.85 56.78 56.36 56.31 56.80 55.71 Accidents 30.04 35.49 27.70 26.98 34.10 33.89 26.98 Retail 11.04 10.91 10.92 10.85 10.83 10.83 10.72 Pumsb-star 24.78 32.53 24.23 22.41 31.34 31.96 22.41 DNA 82.52 97.23 84.92 81.21 92.95 92.84 81.07 Kosarak 10.99 10.89 10.88 10.60 10.74 10.77 10.52 MSWeb 10.25 10.12 9.97 9.73 9.88 9.89 9.62 Book 35.89 34.68 35.01 34.14 34.13 34.34 34.14 Each Movie 52.49 53.63 52.56 51.51 51.66 50.94 50.34 Web KB 158.20 157.53 157.49 151.84 156.02 157.33 149.20 Reuters-52 85.07 87.37 84.63 83.35 84.31 84.44 81.87 20 Newsgrp 155.93 152.06 153.21 151.47 151.99 151.95 151.02 BBC 250.69 252.14 248.60 248.93 249.70 254.69 229.21 AD 19.73 48.47 27.20 19.05 63.80 63.80 14.00
Dataset Splits No Since the Bayesian framework is protected against overfitting, we can evaluate hyper-parameters directly on the Bayesian model score, waiving the need for a separate validation set, which is especially beneficial in low data regimes. Note that within the validation loop, the computational graph remains fixed. All methods have been trained using the full training set, i.e. training and validation set combined, and were evaluated using default parameters to ensure a fair comparison across methods and levels of missing values.
Hardware Specification No The paper does not provide specific details regarding the hardware used for running the experiments (e.g., GPU/CPU models, memory specifications).
Software Dependencies No The paper mentions that an "implementation of Bayesian SPNs in form of a Julia package" is available, but it does not specify the version numbers for Julia or any other software dependencies, libraries, or solvers used in the experiments.
Experiment Setup Yes We constructed G using the algorithm described in the supplement and used a grid search over the parameters of the graph. Further, we used 5 103 burn-in steps and estimated the predictive distribution using 104 samples from the posterior. Since the Bayesian framework is protected against overfitting, we combined training and validation sets and followed classical Bayesian model selection [34], i.e. using the Bayesian model evidence. All methods have been trained using the full training set, i.e. training and validation set combined, and were evaluated using default parameters to ensure a fair comparison across methods and levels of missing values. See supplement Section A.3 for further details.