Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
CausalRivers - Scaling up benchmarking of causal discovery for real-world time-series
Authors: Gideon Stein, Maha Shadaydeh, Jan Blunk, Niklas Penzel, Joachim Denzler
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To demonstrate the utility of Causal Rivers, we evaluate several causal discovery approaches through a set of experiments to identify areas for improvement. |
| Researcher Affiliation | Academia | Gideon Stein, Maha Shadaydeh, Jan Blunk, Niklas Penzel, Joachim Denzler Computer Vision Group Jena Friedrich Schiller University Jena Jena, Thuringia 07743, Germany EMAIL |
| Pseudocode | No | The paper describes the baseline strategies (CC, RP, Combo) in prose, detailing the logic and conditions, but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | To make usage as accessible as possible, we provide a ready-to-use benchmark package with many features2. 2https://github.com/causalrivers |
| Open Datasets | Yes | To bridge this gap, we introduce Causal Rivers1, the largest in-the-wild causal discovery benchmarking kit for time-series data to date. Causal Rivers features an extensive dataset on river discharge... 1https://causalrivers.github.io |
| Dataset Splits | Yes | Instead, we provide sampling strategies to generate thousands of subgraphs with a flexible amount of nodes and unique graph characteristics such as single-sink nodes, root causes, hidden-confounding, or simply connected graphs. ... We used Rivers Bavaria and sampled training and validation examples (identical to the strategy Random-5 ) to finetune a pre-trained network provided by Stein et al. (2024). |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions several methods (e.g., PCMCI, Varlingam, Dynotears, CDMI, CP) but does not provide specific version numbers for any software libraries, packages, or programming languages used. |
| Experiment Setup | Yes | As causal discovery methods typically come with at least some Hyperparameters, we performed a rudimentary Hyperparameter search per method which we document in appendix A.2. ... We performed a small Hyperparameter search, testing for different values of the learning rate, weight decay, batch size, time-series resolution, normalization, and the CP architecture. |