reproducibilityindex.ai

Advances in Learning Bayesian Networks of Bounded Treewidth

Authors: Siqi Nie, Denis D. Maua, Cassio P de Campos, Qiang Ji

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The approaches are empirically compared to each other and to state-of-the-art methods on a collection of public data sets with up to 100 variables.
Researcher Affiliation	Academia	Siqi Nie Rensselaer Polytechnic Institute Troy, NY, USA nies@rpi.edu Denis D. Mau a University of S ao Paulo S ao Paulo, Brazil denis.maua@usp.br Cassio P. de Campos Queen s University Belfast Belfast, UK c.decampos@qub.ac.uk Qiang Ji Rensselaer Polytechnic Institute Troy, NY, USA qji@ecse.rpi.edu
Pseudocode	Yes	Algorithm 1 Learning a structure of bounded treewidth by sampling Dandelion codes. ... Algorithm 2 Sampling a partial order within a k-tree.
Open Source Code	No	The paper provides links to the implementations of competing methods (TWILP and K&P) but does not state that the code for its own proposed methods (MILP, S+K&P, S2) is open source or publicly available. It mentions "The S+K&P and S2 algorithms were implemented (purely) in Matlab" but no link.
Open Datasets	Yes	on a collection of data sets from the UCI repository. The data sets were selected so as to span a wide range of dimensionality, and were preprocessed to have variables discretized over the median value when needed. Some columns of the original data sets audio and community were discarded: 7 variables of audio had a constant value, 5 variables of community have almost one different value per sample (such as personal data), and 22 variables had missing data (Table 1 shows the number of (binary) variables after pre-processing).
Dataset Splits	No	The paper mentions evaluating on "public data sets" and discusses "unseen data" in a general sense, but it does not specify explicit training/validation/test splits, percentages, or the methodology (e.g., k-fold cross-validation) for partitioning the datasets.
Hardware Specification	No	The experiments were run in a computer with 32 cores, memory limit of 64GB, time limit of 3h and maximum number of parents of three (the latter restriction facilitates the experiments and does not constrain the treewidth).
Software Dependencies	Yes	While we emphasize that one should be careful when directly comparing execution time between methods, as the implementations use different languages (we are running CPLEX 12.4, the original K&P uses a Cython compiled Python code, TWILP uses a Python interface to CPLEX to generate the cutting plane mechanism)
Experiment Setup	Yes	We used treewidth bounds of 4 and 10, and maximum parent set size of 3, except for hill and community, where it was set as 2 to help the integer programming approaches (which suffer the most from large parent sets). ... Both MILP and TWILP used CPLEX 12.4 with a memory limit of 64GB to solve the optimizations. We have allowed CPLEX to run up to three hours, collecting the incumbent solution after 10 minutes. S+K&P and S2 have been given 10 minutes. ... In all experiments, we maximize the Bayesian Dirichlet equivalent uniform score with equivalent sample size equal to one.