reproducibilityindex.ai

Learning the Parameters of Bayesian Networks from Uncertain Data

Authors: Segev Wasserkrug, Radu Marinescu, Sergey Zeltyn, Evgeny Shindin, Yishai A Feldman12190-12197

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present an approach for learning Bayesian network parameters that explicitly incorporates such uncertainty, and which is a natural extension of the Bayesian network formalism. We present a generalization of the Expectation Maximization parameter learning algorithm that enables it to handle any historical data with likelihood-evidence-based uncertainty, as well as an empirical validation demonstrating the improved accuracy and convergence enabled by our approach.
Researcher Affiliation	Industry	Segev Wasserkrug1, Radu Marinescu2, Sergey Zeltyn1, Evgeny Shindin1, Yishai A. Feldman1 1 IBM Research Haifa 2 IBM Research Europe
Pseudocode	Yes	Algorithm 1 EM-Likelihood: an EM algorithm for learning with likelihood evidence
Open Source Code	Yes	We implemented our algorithm on top of the open-source Merlin1 library and used three networks for validation. 1Available at http://github.com/radum2275/merlin.
Open Datasets	No	The paper describes generating samples from specified Bayesian networks ("Asia network", "Child network") but does not provide access to the generated datasets. For example: "Sample data for the extended network with the assigned CPTs." and "Create a dataset with likelihood evidence from the sampled data."
Dataset Splits	No	The paper does not explicitly specify training, validation, and test splits. It mentions collecting a dataset of "100,000 observations per experiment" and then running "EM parameter estimation for the dataset with likelihood evidence and the deterministic dataset" and comparing "goodness-of-ﬁt with respect to the actual network CPTs."
Hardware Specification	No	The paper does not provide specific details about the hardware used for the experiments.
Software Dependencies	No	The paper mentions using "Merlin" and "Hugin" but does not specify version numbers for these software dependencies, for example: "We implemented our algorithm on top of the open-source Merlin1 library" and "we used Hugin2 to generate samples from the networks."
Experiment Setup	Yes	In all experiments we initialized the algorithm using a uniform distribution. ... Then, in Step 7, these observations were merged into a dataset with 100,000 observations per experiment, 20,000 observations per each value for each uncertain node. ... For each uncertain node, we varied the CPTs (and thereby generated likelihood evidence), over the values 0.6, 0.7, 0.8, 0.9 and 0.95.