Causal Structure Discovery from Distributions Arising from Mixtures of DAGs

Authors: Basil Saeed, Snigdha Panigrahi, Caroline Uhler

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate our results on synthetic and real data showing that the inferred graph identifies nodes that vary between the different mixture components. As an immediate application, we demonstrate how retrieval of this causal information can be used to cluster samples according to each mixture component. 5. EXPERIMENTS We generated K component DAGs each with |V| = 10 nodes and the same topological ordering from an Erd os R enyi model... Data was sampled from each DAG... ran the R implementation of FCI from the pcalg library on this synthetic data... Figure 2e shows the normalized SHD averaged over 30 realizations of synthetic datasets.
Researcher Affiliation Academia 1Laboratory for Information and Decision Systems and Institute for Data, Systems and Society, Massachusetts Institute of Technology, Cambridge, MA, USA 2Department of Statistics, University of Michigan, Ann Arbor, MI, USA 3Department of Biosystems Science and Engineering, ETH Zurich, Switzerland.
Pseudocode Yes Algorithm 1: Construction of the marginal ancestral graph
Open Source Code No The paper does not provide an explicit statement or link indicating that the source code for the methodology described in the paper is openly available.
Open Datasets Yes gene expression data from ovarian cancer in K = 2 patient groups (with 93 and 168 observations, respectively) with different survival rates (Tothill et al., 2008). single-cell gene expression data of naive and activated T cells (i.e. K = 2, with 298 and 377 samples, respectively) from Singer et al. (2016).
Dataset Splits No The paper describes data sizes and how data was used (e.g., 'ran the R implementation of FCI on this synthetic data') but does not specify explicit train/validation/test splits or cross-validation setup for model training.
Hardware Specification No The paper does not provide any specific hardware details such as GPU or CPU models, processor types, or memory specifications used for running experiments.
Software Dependencies No we ran the R implementation of FCI from the pcalg library
Experiment Setup Yes We generated K component DAGs each with |V| = 10 nodes and the same topological ordering from an Erd os R enyi model with expected degree d = 1.5/K so that the nodes in the M have expected degree less than 1.5. each edge weight (u, v) was sampled uniformly in [ 2, 0.25] [0.25, 2]... The mean for the Gaussian noise was sampled uniformly in [ 2, 2] with standard deviation 1. ran the R implementation of FCI from the pcalg library on this synthetic data using Gaussian conditional independence tests (despite the true distribution being a mixture of Gaussians) with threshold α.