Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

A Unified Experiment Design Approach for Cyclic and Acyclic Causal Models

Authors: Ehsan Mokhtarian, Saber Salehkaleybar, AmirEmad Ghassami, Negar Kiyavash

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In Figure 6, we report the number of experiments performed by our proposed method and the accuracy of the learned graphs when the underlying true graphs are generated randomly from SBM(n, p, b). Each point on the plots is reported as the average of 50 runs with a 90% conﬁdence interval. We measured the accuracy of the recovered DGs by normalized structural hamming distance (SHD/n) and F1-scores, which we formally deﬁne in Subsection 7.5.
Researcher Affiliation	Academia	Ehsan Mokhtarian EMAIL School of Computer and Communication Sciences EPFL 1015 Lausanne, Switzerland Saber Salehkaleybar EMAIL Leiden Institute of Advanced Computer Science (LIACS) Leiden University 2333 CA Leiden, Netherlands Amir Emad Ghassami EMAIL Department of Mathematics and Statistics, Boston University Boston, MA 02215 USA Negar Kiyavash EMAIL College of Management of Technology EPFL 1015 Lausanne, Switzerland
Pseudocode	Yes	Algorithm 1: Learning descendant sets and strongly connected components Algorithm 2: Learning a DG G
Open Source Code	Yes	Our codes are available at https://github.com/Ehsan-Mokhtarian/cyclic_experiment_design.
Open Datasets	No	In an SBM(n, p, b), a graph G with n vertices is generated as follows: the variables are randomly partitioned into n/b blocks: B1, ..., B n/b , where \|Bi\| = b for 1 i n/b 1. For two variables in the same block, there can exist an edge in both directions, each with probability p. For two variables in diﬀerent blocks, there can be an edge between them with probability p only in one direction. That is, directed edge (X, Y ) exists with probability p when X Bi and Y Bj, where 1 i<j n/b . This means that the variables in each SCC belong to the same block, and b is a surrogate for ζmax(G). For each graph, synthetic data sets from observational and interventional distributions were generated with a ﬁnite number of samples and fed to our proposed algorithm.
Dataset Splits	No	For each graph, synthetic data sets from observational and interventional distributions were generated with a ﬁnite number of samples and fed to our proposed algorithm. The observational samples were generated using a linear SCM where each variable X is a linear combination of its parents plus an exogenous noise variable ϵX; the coeﬃcients were chosen uniformly at random from [-1.5, 1] [1, 1.5], and ϵX was generated at random according to N(0, σ2 X), where σX is selected uniformly at random from [0.5, 1.5]. To generate interventional samples for an experiment on a subset I V, the equation of each variable in V egex I remained unchanged, and the equation of each variable X I was replaced by X = ϵX, where ϵX had the same distribution as in the original SCM. No explicit dataset splits for training, validation, or testing are mentioned beyond generating samples from observational and interventional distributions.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models or processor types. It only mentions the use of MATLAB for implementation.
Software Dependencies	No	For the simulations of this section, we used the structure learning algorithm in Mokhtarian et al. (2022) to learn Gobs r , as it is scalable to large graphs. ... To color Gobs r , we applied trail-path algorithm in Bandyopadhyay et al. (2020). To ﬁnd the descendant sets and the strongly connected components of H in line 9 of Algorithm 1, we used the predeﬁned function conncomp in MATLAB. Finally, we used Fisher Ztransformation with a signiﬁcance level of 0.01 to perform conditional independence tests. The paper mentions MATLAB and other algorithms but does not provide specific version numbers for any software.
Experiment Setup	Yes	The observational samples were generated using a linear SCM where each variable X is a linear combination of its parents plus an exogenous noise variable ϵX; the coeﬃcients were chosen uniformly at random from [-1.5, 1] [1, 1.5], and ϵX was generated at random according to N(0, σ2 X), where σX is selected uniformly at random from [0.5, 1.5]. ... Finally, we used Fisher Ztransformation with a signiﬁcance level of 0.01 to perform conditional independence tests. ... Figure 6a illustrates the eﬀect of n (number of vertices) and b (the parameter that controls ζmax(G)) when p = log(n)/n (graph density) and the number of samples was ﬁxed at 200n.