Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Structure Discovery in Bayesian Networks by Sampling Partial Orders

Authors: Teppo Niinimäki, Pekka Parviainen, Mikko Koivisto

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We have implemented the proposed partial-order-MCMC method, including the extensions based on MC3 and AIS, for the special case of bucket orders (see Examples 4 7).2 We will refer to these three variants of the methods simply as MCMC, MC3, and AIS. This section reports experimental results on a selection of data sets of different characteristics. Details of the data sets and the employed Bayesian models are given in Section 6.1. Implementation details of the computational methods are given in Section 6.2. We aim to answer four main questions: Does sampling partial orders provide us with a significant advantage over sampling linear orders? How accurate is AIS as compared to MC3? Does the bias correction approach (i.e., scaling by the number of linear extensions) work in practice? How well can we estimate and lower bound the marginal likelihood of the model? We address these questions in Sections 6.3 6.6, respectively.
Researcher Affiliation	Academia	Teppo Niinim aki EMAIL Helsinki Institute for Information Technology Department of Computer Science P.O. Box 68 (Gustaf H allstr ominkatu 2b) FI-00014 University of Helsinki, Finland Pekka Parviainen EMAIL Helsinki Institute for Information Technology Department of Computer Science P.O. Box 15400 (Konemiehentie 2) FI-00076 Aalto, Finland Mikko Koivisto EMAIL Helsinki Institute for Information Technology Department of Computer Science P.O. Box 68 (Gustaf H allstr ominkatu 2b) FI-00014 University of Helsinki, Finland
Pseudocode	Yes	Algorithm All Arcs Input: partial order P on N and ϕv(Av) for (v, Av) N Cv. Output: ϕst(P) for all pairs s, t N. 1. Compute αv(Y ) for all (v, Y ) N D. 2. Compute F(Y ) and B(Y ) for all Y D. 3. For each t N: (a) Compute γt(At) for all At Ct. (b) For each s N : Compute ϕst(P) using (11).
Open Source Code	Yes	The program BEANDisco, written in C++, is publicly available at www.cs.helsinki.fi/u/tzniinim/BEANDisco/.
Open Datasets	Yes	The Flare, German, Mushroom, and Spambase data sets are obtained from the UCI Machine Learning Repository (Lichman, 2013). The Alarm data set was generated from the Alarm network (Beinlich et al., 1989).
Dataset Splits	No	The paper describes using 'the whole data set and a subsample consisting of 1000 randomly selected records of the data set' for Mushroom, and details about burn-in and thinning for MCMC samples. However, it does not provide specific training/test/validation splits for the datasets themselves. The '50% of the samples were treated as burn-in samples' refers to MCMC chain samples, not a data split for model training/evaluation.
Hardware Specification	Yes	The available memory was limited to 16 GB.
Software Dependencies	No	The paper states: 'The program BEANDisco, written in C++, is publicly available at www.cs.helsinki.fi/u/tzniinim/BEANDisco/.' While it mentions the programming language C++, it does not specify any particular libraries or their version numbers, which would be necessary for reproducibility.
Experiment Setup	Yes	In these models we set the maximum indegree parameter k to 4 for all data sets, except for Spambase, for which we set the value to 3 in order to keep the per-sample computations feasible. ... The sampling space was set to the balanced bucket orders of maximum bucket size b (see Example 4). Separately for each data, we set the parameter b to a value as large as possible, subject to the condition that its impact to the running time is no more than about 2 times the impact of the terms that do not depend on b. ... We employed swap proposals, as described in Example 7. ... Burn-in iterations. Always 50% of the samples were treated as burn-in samples that were not included in the estimates of the quantities of interest. ... Thinning. We included only every 1024th of the visited states in the final sample. ... Per configuration we allowed a total running time of 4 days... We ran each configuration 7 times, starting from states drawn independently and uniformly at random. ... Tempering scheme. We used the linear stepping scheme. For MC3 we varied the number of temperature levels K in {3, 15, 63}... For AIS we set the number of levels proportionally to the data size, K = K mn where the factor K varied in {1/4, 1, 4}.