Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Survey and Evaluation of Causal Discovery Methods for Time Series

Authors: Charles K. Assaad, Emilie Devijver, Eric Gaussier

JAIR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present in this section an experimental comparison of the major causal discovery methods we have reviewed. To do so, we first describe the selected evaluation measures and discuss the retained methods as well as the artificial datasets corresponding to basic causal structures and the standard real world benchmark we have considered. We then present the results of all experiments.
Researcher Affiliation	Collaboration	Charles K. Assaad EMAIL Univ. Grenoble Alpes, CNRS, Grenoble INP, LIG, Easy Vista, 38000 Grenoble, France Emilie Devijver EMAIL Eric Gaussier EMAIL Univ. Grenoble Alpes, CNRS, Grenoble INP, LIG, 38000 Grenoble, France
Pseudocode	Yes	Algorithm 1 PWGC... Algorithm 2 MVGC... Algorithm 3 TCDF... Algorithm 4 PCMCI... Algorithm 5 o CSE... Algorithm 6 ts FCI... Algorithm 7 Var Li NGAM... Algorithm 8 Ti MINo... Algorithm 9 DYNOTEARS
Open Source Code	Yes	Our implementations of PWGC and o CSE are available at https://github.com/ckassaad/causal_ discovery_for_time_series; furthermore, all methods can be used through a Python routine available at https://github.com/ckassaad/causal_discovery_for_time_series. ... for MVGC the code available at http://www.sussex.ac. uk/sackler/mvgc/. In addition, we include in our comparison TCDF and rely for this method on the implementation available at https://github.com/M-Nauta/TCDF. ... Both scores are available in the implementation provided at https://github.com/jakobrunge/tigramite. We also include o CSE, which we implemented. ... Finally, we also consider ts FCI, with the implementation provided at https://sites.google.com/site/dorisentner/publications/tsfci... Among the noise-based approaches (Section 5), we retain Var Li NGAM and Ti MINo, which are respectively available at https://github.com/cdt15/lingam and http://web.math.ku. dk/~peters/code.html. For Var Li NGAM, the regularization parameter in the adaptive Lasso is selected using BIC, and no statistical test is performed as we directly use the value of the statistics. Ti MINo uses a test based on cross-correlation that can be derived from Brockwell and Davis (1986, Thm 11.2.3.). We have retained the most recent score-based method, namely DYNOTEARS (Pamfil et al., 2020) available at https://github.com/quantumblacklabs/causalnex
Open Datasets	Yes	The artificial datasets, available at https://dataverse.harvard.edu/dataverse/basic_causal_ structures_additive_noise, correspond to five basic causal structures presented in Table 4: fork, v-structure, mediator, diamond, as well as to a nine nodes structure introduced by Spirtes et al. (2001) and referred to as 7ts2h. ... The real-world benchmark we have retained here is FMRI (Functional Magnetic Resonance Imaging) which contains BOLD (Blood-oxygen-level dependent) datasets for 28 different underlying brain networks (Smith et al., 2011)16. Each dataset contains the neural activity, based on the change of blood flow, of at most 50 different regions. Each region corresponds to a time series which contains between 50 and 5000 time points. ... Original data: https://www.fmrib.ox.ac.uk/datasets/netsim/index.html Preprocessed version: https://github.com/M-Nauta/TCDF/tree/master/data/f MRI
Dataset Splits	No	The paper does not explicitly provide training/test/validation dataset splits. For artificial datasets, it states: "For each structure and each length, we generate 10 different datasets over which the performance of each method is averaged." For the FMRI dataset, it mentions using 27 networks but no specific splits.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory) used for running its experiments.
Software Dependencies	No	The paper mentions several tools and codebases used (e.g., MVGC code, TCDF implementation, tigramite, Var Li NGAM, Ti MINo, DYNOTEARS, and custom PWGC/o CSE implementations) and provides links to them. However, it does not specify version numbers for these software components or any underlying programming languages/libraries (like Python, PyTorch, etc.) that would be needed for replication.
Experiment Setup	Yes	For the hyper-parameters in this latter method [TCDF], we used the values suggested by the authors: a kernel of size 4, a dilation coefficient equal to 4, 1 hidden layer, a learning rate of 0.01, and 5000 epochs. ... For Var Li NGAM, the regularization parameter in the adaptive Lasso is selected using BIC... [DYNOTEARS] the hyperparameters of which are set to their recommended values (λW = λA = 0.05 and αW = αA = 0.01). For all the methods, when doing a statistical test, we use a significance level of 0.05.