Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Learning domain-specific causal discovery from time series

Authors: Xinyue Wang, Konrad Kording

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our findings indicate that this procedure significantly outperforms human-designed, domain-agnostic causal discovery methods, such as Mutual Information, VAR-Li NGAM, and Granger Causality on the MOS 6502 microprocessor, the Net Sim f MRI dataset, and the Dream3 gene dataset.
Researcher Affiliation	Academia	Xinyue Wang EMAIL Department of Bioengineering University of Pennsylvania Konrad Kording EMAIL Department of Bioengineering University of Pennsylvania
Pseudocode	Yes	Algorithm 1 Adjacency Matrix Generation for M Periods
Open Source Code	Yes	Code Availability: Code for simulation and learning is available at (https://github.com/Kording Lab/Learning Causal Discovery).
Open Datasets	Yes	Our findings indicate that this procedure significantly outperforms human-designed, domain-agnostic causal discovery methods, such as Mutual Information, VAR-Li NGAM, and Granger Causality on the MOS 6502 microprocessor, the Net Sim f MRI dataset, and the Dream3 gene dataset.
Dataset Splits	Yes	For each period, we eliminate duplicates and use transistors belonging to the first 1755 transistors as training transistors, while employing those from the remaining 1755 transistors as test transistors. [...] Next, we use the test transistors in the first five periods (0 128, 128 256, 384 512, 512 640, 640 768) as five testing sets and the sixth period (768 896) as the validation set. Training transistors in the remaining periods serve as the training set.
Hardware Specification	No	The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running experiments.
Software Dependencies	No	The paper mentions several software implementations like Sklearn, Li NGAM, causal-learn, Causal Nex, tigramite, and Grad-CAM, but does not provide specific version numbers for any of them.
Experiment Setup	Yes	Input: We set sequence length L to be 3840 to align with the original 128 half-clock sequences. The window length is 32 with no overlapping and the dimensional of window embedding C is 192. Encoder: We use a Transformer with depth 8, hidden size 768 and 3 attention heads, as sequence encoder. ... Optimization: Pooler output is regarded as P(Causal\|X) and compared with the adjacency matrix to acquire focal loss. We use focal loss (Lin et al., 2017) instead of regular binary cross-entropy loss with 0.7 as α and 3 as γ, because the sample is not balanced. We optimize the network with Adam W and a learning rate 0.001 and weight decay 0.05 and batch size 1024 for 100 epochs. The learning rate is adjusted by the Cosine Annealing scheduler in the rest process.