Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Predicting the Hardness of Learning Bayesian Networks
Authors: Brandon Malone, Kustaa Kangas, Matti Jarvisalo, Mikko Koivisto, Petri Myllymaki
AAAI 2014 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical results, based on the largest evaluation of stateof-the-art BNS learning algorithms to date, demonstrate that we can predict the runtimes to a reasonable degree of accuracy, and effectively select algorithms that perform well on a particular instance. |
| Researcher Affiliation | Academia | Helsinki Institute for Information Technology & Department of Computer Science, University of Helsinki, Finland |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions links to external solvers used (GOBNILP, URLearning) and an 'online supplement' for 'more details' (http://bnportfolio.cs.helsinki.fi/), but does not explicitly state that the source code for their own methodology is being released or provide a clear link to it. |
| Open Datasets | Yes | Datasets originating from the UCI repository (Bache and Lichman 2013). 19 datasets. Datasets sampled from benchmark Bayesian networks, downloaded from http://www.cs.york.ac.uk/aig/ sw/gobnilp/. |
| Dataset Splits | Yes | We evaluated the portfolios using the standard 10-fold cross-validation technique. That is, the data is partitioned into 10 non-overlapping subsets. In each fold, 9 of the subsets are used to train the model, and the remaining set is used for testing; each subset is used as the testing set once. |
| Hardware Specification | Yes | For running the experiments we used a cluster of Dell Power Edge M610 computing nodes equipped with two 2.53GHz Intel Xeon E5540 CPUs and 32-GB RAM. |
| Software Dependencies | Yes | We use the GOBNILP solver, version 1.4.1 (http: //www.cs.york.ac.uk/aig/sw/gobnilp/) as a representative for ILP. GOBNILP uses the SCIP framework (http://scip.zib.de/) and an external linear program solver; we used SCIP 3.0.1 and So Plex 1.7.1 (http:// soplex.zib.de/). |
| Experiment Setup | Yes | For each dataset and scoring function, we generated scores with parent limits ranging from 2 to up to 6. For each individual run, we used a timeout of 2 hours and a 28-GB memory limit. We considered 5 different scoring functions: BDeu with the Equivalent Sample Size parameter selected from {0.1, 1, 10, 100} and the BIC scores. |