Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Unified Experiment Design Approach for Cyclic and Acyclic Causal Models

Authors: Ehsan Mokhtarian, Saber Salehkaleybar, AmirEmad Ghassami, Negar Kiyavash

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Figure 6, we report the number of experiments performed by our proposed method and the accuracy of the learned graphs when the underlying true graphs are generated randomly from SBM(n, p, b). Each point on the plots is reported as the average of 50 runs with a 90% confidence interval. We measured the accuracy of the recovered DGs by normalized structural hamming distance (SHD/n) and F1-scores, which we formally define in Subsection 7.5.
Researcher Affiliation Academia Ehsan Mokhtarian EMAIL School of Computer and Communication Sciences EPFL 1015 Lausanne, Switzerland Saber Salehkaleybar EMAIL Leiden Institute of Advanced Computer Science (LIACS) Leiden University 2333 CA Leiden, Netherlands Amir Emad Ghassami EMAIL Department of Mathematics and Statistics, Boston University Boston, MA 02215 USA Negar Kiyavash EMAIL College of Management of Technology EPFL 1015 Lausanne, Switzerland
Pseudocode Yes Algorithm 1: Learning descendant sets and strongly connected components Algorithm 2: Learning a DG G
Open Source Code Yes Our codes are available at https://github.com/Ehsan-Mokhtarian/cyclic_experiment_design.
Open Datasets No In an SBM(n, p, b), a graph G with n vertices is generated as follows: the variables are randomly partitioned into n/b blocks: B1, ..., B n/b , where |Bi| = b for 1 i n/b 1. For two variables in the same block, there can exist an edge in both directions, each with probability p. For two variables in different blocks, there can be an edge between them with probability p only in one direction. That is, directed edge (X, Y ) exists with probability p when X Bi and Y Bj, where 1 i<j n/b . This means that the variables in each SCC belong to the same block, and b is a surrogate for ζmax(G). For each graph, synthetic data sets from observational and interventional distributions were generated with a finite number of samples and fed to our proposed algorithm.
Dataset Splits No For each graph, synthetic data sets from observational and interventional distributions were generated with a finite number of samples and fed to our proposed algorithm. The observational samples were generated using a linear SCM where each variable X is a linear combination of its parents plus an exogenous noise variable ϵX; the coefficients were chosen uniformly at random from [-1.5, 1] [1, 1.5], and ϵX was generated at random according to N(0, σ2 X), where σX is selected uniformly at random from [0.5, 1.5]. To generate interventional samples for an experiment on a subset I V, the equation of each variable in V egex I remained unchanged, and the equation of each variable X I was replaced by X = ϵX, where ϵX had the same distribution as in the original SCM. No explicit dataset splits for training, validation, or testing are mentioned beyond generating samples from observational and interventional distributions.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models or processor types. It only mentions the use of MATLAB for implementation.
Software Dependencies No For the simulations of this section, we used the structure learning algorithm in Mokhtarian et al. (2022) to learn Gobs r , as it is scalable to large graphs. ... To color Gobs r , we applied trail-path algorithm in Bandyopadhyay et al. (2020). To find the descendant sets and the strongly connected components of H in line 9 of Algorithm 1, we used the predefined function conncomp in MATLAB. Finally, we used Fisher Ztransformation with a significance level of 0.01 to perform conditional independence tests. The paper mentions MATLAB and other algorithms but does not provide specific version numbers for any software.
Experiment Setup Yes The observational samples were generated using a linear SCM where each variable X is a linear combination of its parents plus an exogenous noise variable ϵX; the coefficients were chosen uniformly at random from [-1.5, 1] [1, 1.5], and ϵX was generated at random according to N(0, σ2 X), where σX is selected uniformly at random from [0.5, 1.5]. ... Finally, we used Fisher Ztransformation with a significance level of 0.01 to perform conditional independence tests. ... Figure 6a illustrates the effect of n (number of vertices) and b (the parameter that controls ζmax(G)) when p = log(n)/n (graph density) and the number of samples was fixed at 200n.