Learning Treewidth-Bounded Bayesian Networks with Thousands of Variables

Authors: Mauro Scanagatta, Giorgio Corani, Cassio P. de Campos, Marco Zaffalon

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We compare k-A*, k-G, S2, S2+ and TWILP in various experiments. We compare them through an indicator which we call W-score: the percentage of worsening of the BIC score... For the first time we present experimental results for structural learning with bounded treewidth for domains involving up to ten thousand variables.
Researcher Affiliation Academia Mauro Scanagatta IDSIA , SUPSI , USI Lugano, Switzerland mauro@idsia.ch Giorgio Corani IDSIA , SUPSI , USI Lugano, Switzerland giorgio@idsia.ch Cassio P. de Campos Queen s University Belfast Northern Ireland, UK c.decampos@qub.ac.uk Marco Zaffalon IDSIA Lugano, Switzerland zaffalon@idsia.ch
Pseudocode No The paper describes the algorithms k-A* and k-G in prose but does not provide them in a structured pseudocode or algorithm block.
Open Source Code Yes Software and supplementary material are available from http://blip.idsia.ch.
Open Datasets Yes We now present experiments on the data sets considered by Nie et al. (2016). They involve up to 100 variables. We consider 10 large data sets (100 n 400) listed in Table 3. Eventually we consider 14 very large data sets, containing between 400 and 10000 variables... three randomly-generated synthetic data sets... generated using the software BNGenerator 4. http://sites.poli.usp.br/pmr/ltd/Software/BNGenerator/
Dataset Splits No The paper does not provide specific train/validation/test dataset split information. It mentions 'a complete data set of N instances D = {D1, ..., DN}' and 'We split each data set randomly into three subsets' for experimental purposes, but these are not train/validation/test splits.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper mentions 'Gobnilp solver' and 'Max SAT solver' without version numbers. It mentions 'BNGenerator 4' for synthetic data generation, but this is not a core software dependency for the main methodology.
Experiment Setup Yes We allow 60 seconds of time for the computation of the scores of the parent set of each variable, in each data set. We allow each method to run for ten minutes. We let each method run for one hour. We set the bounded treewidth to k = 4. We consider the following treewidths: k {2, 5, 8}. All variables are binary and we sample their conditional probability tables from a Beta(1,1). We sample 10,000 instances from each generated inverted tree.