Learning Treewidth-Bounded Bayesian Networks with Thousands of Variables
Authors: Mauro Scanagatta, Giorgio Corani, Cassio P. de Campos, Marco Zaffalon
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare k-A*, k-G, S2, S2+ and TWILP in various experiments. We compare them through an indicator which we call W-score: the percentage of worsening of the BIC score... For the first time we present experimental results for structural learning with bounded treewidth for domains involving up to ten thousand variables. |
| Researcher Affiliation | Academia | Mauro Scanagatta IDSIA , SUPSI , USI Lugano, Switzerland mauro@idsia.ch Giorgio Corani IDSIA , SUPSI , USI Lugano, Switzerland giorgio@idsia.ch Cassio P. de Campos Queen s University Belfast Northern Ireland, UK c.decampos@qub.ac.uk Marco Zaffalon IDSIA Lugano, Switzerland zaffalon@idsia.ch |
| Pseudocode | No | The paper describes the algorithms k-A* and k-G in prose but does not provide them in a structured pseudocode or algorithm block. |
| Open Source Code | Yes | Software and supplementary material are available from http://blip.idsia.ch. |
| Open Datasets | Yes | We now present experiments on the data sets considered by Nie et al. (2016). They involve up to 100 variables. We consider 10 large data sets (100 n 400) listed in Table 3. Eventually we consider 14 very large data sets, containing between 400 and 10000 variables... three randomly-generated synthetic data sets... generated using the software BNGenerator 4. http://sites.poli.usp.br/pmr/ltd/Software/BNGenerator/ |
| Dataset Splits | No | The paper does not provide specific train/validation/test dataset split information. It mentions 'a complete data set of N instances D = {D1, ..., DN}' and 'We split each data set randomly into three subsets' for experimental purposes, but these are not train/validation/test splits. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions 'Gobnilp solver' and 'Max SAT solver' without version numbers. It mentions 'BNGenerator 4' for synthetic data generation, but this is not a core software dependency for the main methodology. |
| Experiment Setup | Yes | We allow 60 seconds of time for the computation of the scores of the parent set of each variable, in each data set. We allow each method to run for ten minutes. We let each method run for one hour. We set the bounded treewidth to k = 4. We consider the following treewidths: k {2, 5, 8}. All variables are binary and we sample their conditional probability tables from a Beta(1,1). We sample 10,000 instances from each generated inverted tree. |