Distributionally Robust Skeleton Learning of Discrete Bayesian Networks
Authors: Yeshu Li, Brian Ziebart
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical study on synthetic and real datasets validates the effectiveness of our method. We conduct experiments3 on benchmark datasets [Scutari, 2010] and real-world datasets [Malone et al., 2015] perturbed by the following contamination models: Noisefree model. This is the baseline model without any noises. Huber s contamination model. In this model, each sample has a fixed probability of ζ to be replaced by a sample drawn from an arbitrary distribution. Independent failure model. Each entry of a sample is independently corrupted with probability ζ. We conduct all experiments on a laptop with an Intel Core i7 2.7 GHz processor. |
| Researcher Affiliation | Collaboration | Yeshu Li Alibaba Group liyeshu.lys@alibaba-inc.com Brian D. Ziebart Department of Computer Science University of Illinois at Chicago bziebart@uic.edu |
| Pseudocode | Yes | The pseudo-code of the greedy algorithm for solving Equation (6) in Wasserstein DRO is illustrated in Algorithm 1. Algorithm 1 Greedy Algorithm for the Wasserstein Worst-case Risk |
| Open Source Code | Yes | Our code is publicly available at https://github.com/DanielLeee/drslbn. |
| Open Datasets | Yes | We conduct experiments3 on benchmark datasets [Scutari, 2010] and real-world datasets [Malone et al., 2015] perturbed by the following contamination models: |
| Dataset Splits | No | When dealing with real-world datasets, we randomly split the data into two halves for training and testing. The paper mentions training and testing splits, but does not explicitly describe a validation split or provide specific percentages for any splits. |
| Hardware Specification | Yes | We conduct all experiments on a laptop with an Intel Core i7 2.7 GHz processor. |
| Software Dependencies | No | For the Wasserstein-based method, we leverage Adam [Kingma and Ba, 2014] to optimize the overall objective... For the KL-based and standard regularization methods, we use the L-BFGS-B [Byrd et al., 1995] optimization method. The paper mentions software tools like Adam and L-BFGS-B, but does not provide specific version numbers for any of these or other software dependencies. |
| Experiment Setup | Yes | For the Wasserstein-based method, we leverage Adam [Kingma and Ba, 2014] to optimize the overall objective with β1 0.9, β2 0.990, a learning rate of 1.0, a batch size of 500, a maximum of 200 iterations for optimization and 10 iterations for approximating the worst-case distribution. For the KL-based and standard regularization methods, we use the L-BFGS-B [Byrd et al., 1995] optimization method with default parameters. We set the cardinality of the maximum conditional set to 3 in MMPC. The Bayesian information criterion (BIC) [Neath and Cavanaugh, 2012] score is adopted in the HC algorithm. A random mixture of 20 random Bayesian networks serves as the adversarial distribution for both contamination models. All hyper-parameters are chosen based on the best performance on random Bayesian networks with the same size as the input one. |