Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

CausalTime: Realistically Generated Time-series for Benchmarking of Causal Discovery

Authors: Yuxiao Cheng, Ziqian Wang, Tingxiong Xiao, Qin Zhong, Jinli Suo, Kunlun He

ICLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In the experiments, we validate the fidelity of the generated data through qualitative and quantitative experiments, followed by a benchmarking of existing TSCD algorithms using these generated datasets.
Researcher Affiliation	Collaboration	1Department of Automation, Tsinghua University 2Institute for Brain and Cognitive Science, Tsinghua University (THUIBCS) 3Chinese PLA General Hospital
Pseudocode	Yes	A.4 ALGORITHMIC REPRESENTATION FOR CAUSALTIME PIPELINE We show the detailed algorithmic representation of our proposed data generation pipeline in Algorithm A.4, where we exclude quality control and TSCD evaluation steps. Algorithm 1 Pipeline for Causal Time Generation (Excluding quality control and TSCD evaluation) ...
Open Source Code	Yes	For the purpose of reproducibility, we include the source code on Git Hub (https://github. com/jarrycyx/UNN).
Open Datasets	Yes	Air Quality Index (AQI) is a subset of several air quality features from 36 monitoring stations spread across Chinese cities2... Traffic subset is built from the time-series collected by traffic sensors in the San Francisco Bay Area3. Medical subset is from MIMIC-4, which is a database that provides critical care data for over 40,000 patients admitted to intensive care units (Johnson et al., 2023).
Dataset Splits	No	To ensure fairness, we searched for the best set of hyperparameters for these baseline algorithms on the validation dataset, and tested performances on testing sets for 5 random seeds per experiment.
Hardware Specification	Yes	All experiments are deployed on a server with Intel Core CPU and NVIDIA RTX3090 GPU.
Software Dependencies	No	The paper mentions software like 'scikit-learn package' for dimension reduction and references implementations for Normalizing Flow and Deep SHAP, but does not provide specific version numbers for these or other software dependencies like Python or PyTorch.
Experiment Setup	Yes	Table 4: Hyper parameters for time-series fitting. Table 5: Hyperparameters settings of the baseline causal discovery and data imputation algorithms. We show the detailed algorithmic representation of our proposed data generation pipeline in Algorithm A.4, where we exclude quality control and TSCD evaluation steps.