An Improved Lower Bound for Bayesian Network Structure Learning

Authors: Xiannian Fan, Changhe Yuan

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results show that the new partition can significantly improve the efficiency and scalability of heuristic search-based structure learning algorithms.
Researcher Affiliation Academia Xiannian Fan and Changhe Yuan Graduate Center and Queens College City University of New York 365 Fifth Avenue, New York 10016 {xfan2@gc, change.yuan@qc}.cuny.edu
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statement or link indicating that its source code is publicly available.
Open Datasets Yes We empirically evaluated our new CG lower bound using the A* and BFBn B algorithms on benchmark datasets from UCI machine learning repository and Bayesian Network Repository (http://compbio.cs.huji.ac. il/Repository/).
Dataset Splits No The paper mentions using 'benchmark datasets' but does not specify how these datasets were split into training, validation, or test sets, nor does it refer to predefined splits with these details.
Hardware Specification Yes The experiments were performed on an IBM System x3850 X5 with 16 core 2.67GHz Intel Xeon Processors and 512G RAM.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies used in the experiments.
Experiment Setup Yes We set the threshold γ to be 22 for small datasets with fewer than 35 variables as the heuristic can be built within 5 seconds, and to be 25 for large datasets with 35 or more variables as the heuristic can be built within 60 seconds. Anytime window A* (AWA*) was used to provided an upper bound for pruning since a previous study (Malone and Yuan 2013) has shown that AWA* is effective at finding high quality solutions quickly, so we provided the upper bound by running AWA* for 5 seconds on small dataset (fewer than 35 variables) and 10 seconds for larger datasets.