Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Automatically Finding the Right Probabilities in Bayesian Networks

Authors: Bahare Salmani, Joost-Pieter Katoen

JAIR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on several benchmarks show that our parameter synthesis techniques can treat parameter synthesis for Bayesian networks (with hundreds of unknown parameters) that are out of reach for existing techniques. We then report on various experiments for analyzing classical, i.e., non-parametric BNs in Section 6. An extensive experimental validation using a prototypical implementation on top of the Storm model checker (Dehnert et al., 2017) compares the eﬃciency of probabilistic inference using explicit-state PMC techniques to Ace, a tool that carries out inference on BNs using compilation into arithmetic circuits. Section 8 reports on the experimental validation of our approach using a tool-chain built on top of the Storm model checker as well as the parameter synthesis tool Prophesy (Dehnert et al., 2015).
Researcher Affiliation	Academia	Bahare Salmani EMAIL Joost-Pieter Katoen EMAIL RWTH Aachen University Aachen, Germany
Pseudocode	No	The paper describes algorithms at an intuitive level in Section 7: "All algorithms in this paper are described at an intuitive level, and some are illustrated with examples; references to detailed descriptions of the algorithms are provided." However, it does not include any explicitly labeled pseudocode blocks or algorithm listings with structured steps.
Open Source Code	Yes	A prototypical tool-chain that is publicly available19 supports all reported capabilities. 19https://github.com/baharSlmn/storm-bn
Open Datasets	Yes	We took the BN benchmarks from the bnlearn repository (Scutari, 2019) and conducted our experiments on a 2.3 GHz Intel Core i5 processor with 16 GB RAM.
Dataset Splits	No	The paper mentions using BN benchmarks from the bnlearn repository but does not explicitly specify how these benchmarks were split into training, test, or validation sets for their experiments. The experiments focus on inference, sensitivity analysis, and parameter tuning, which may not always require explicit train/test splits in the same way as supervised learning tasks.
Hardware Specification	Yes	We conducted our experiments on a 2.3 GHz Intel Core i5 processor with 16 GB RAM.
Software Dependencies	No	The paper mentions several tools used, such as Storm (Dehnert et al., 2017), Prophesy (Dehnert et al., 2015), Jani (Budde et al., 2017), mcsta (Hartmanns & Hermanns, 2014), EPMC (Fu et al., 2022), and PRISM (Kwiatkowska et al., 2011), often citing the foundational papers. However, it does not provide specific version numbers for these software components or for its own implementation, stating only "latest package" for QCQP.
Experiment Setup	Yes	The following GD constants in Storm are set according to the standard settings in the literature (Kingma & Ba, 2015; Ruder, 2016; Liu et al., 2020): the batch size is 32, the average decay is 0.9, and the squared average decay is 0.999.