Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

CausalRivers - Scaling up benchmarking of causal discovery for real-world time-series

Authors: Gideon Stein, Maha Shadaydeh, Jan Blunk, Niklas Penzel, Joachim Denzler

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To demonstrate the utility of Causal Rivers, we evaluate several causal discovery approaches through a set of experiments to identify areas for improvement.
Researcher Affiliation	Academia	Gideon Stein, Maha Shadaydeh, Jan Blunk, Niklas Penzel, Joachim Denzler Computer Vision Group Jena Friedrich Schiller University Jena Jena, Thuringia 07743, Germany EMAIL
Pseudocode	No	The paper describes the baseline strategies (CC, RP, Combo) in prose, detailing the logic and conditions, but does not provide structured pseudocode or algorithm blocks.
Open Source Code	Yes	To make usage as accessible as possible, we provide a ready-to-use benchmark package with many features2. 2https://github.com/causalrivers
Open Datasets	Yes	To bridge this gap, we introduce Causal Rivers1, the largest in-the-wild causal discovery benchmarking kit for time-series data to date. Causal Rivers features an extensive dataset on river discharge... 1https://causalrivers.github.io
Dataset Splits	Yes	Instead, we provide sampling strategies to generate thousands of subgraphs with a flexible amount of nodes and unique graph characteristics such as single-sink nodes, root causes, hidden-confounding, or simply connected graphs. ... We used Rivers Bavaria and sampled training and validation examples (identical to the strategy Random-5 ) to finetune a pre-trained network provided by Stein et al. (2024).
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments, such as GPU or CPU models.
Software Dependencies	No	The paper mentions several methods (e.g., PCMCI, Varlingam, Dynotears, CDMI, CP) but does not provide specific version numbers for any software libraries, packages, or programming languages used.
Experiment Setup	Yes	As causal discovery methods typically come with at least some Hyperparameters, we performed a rudimentary Hyperparameter search per method which we document in appendix A.2. ... We performed a small Hyperparameter search, testing for different values of the learning rate, weight decay, batch size, time-series resolution, normalization, and the CP architecture.