Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Effective and Efficient Time-Varying Counterfactual Prediction with State-Space Models

Authors: Haotian Wang, Haoxuan Li, Hao Zou, Haoang Chi, Long Lan, Wanrong Huang, Wenjing Yang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conducted extensive experiments on both synthetic and real-world datasets, demonstrating that Mamba-CDSP not only outperforms baselines by a large margin, but also exhibits prominent running efficiency.
Researcher Affiliation	Academia	1College of Computer Science and Technology, National University of Defense Technology 2Center for Data Science, Peking University 3Tsinghua University 4Intelligent Game and Decision Lab
Pseudocode	No	The paper describes methods and theoretical analyses using mathematical equations, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code, nor does it provide links to a code repository.
Open Datasets	Yes	Following common practice in benchmarking for counterfactual inference, all the methods are validated on three datasets, including the synthetic tumor growth data (Geng et al., 2017), the MIMIC-III-based semi-synthetic data (Melnychuk et al., 2022; Schulam & Saria, 2017), the MIMIC-III real-world data (Johnson et al., 2016). ... The M5 Forecasting dataset, as cited in (Huang et al., 2024), comprises daily transaction data from Walmart stores across three U.S. states...
Dataset Splits	Yes	For the tumor-growth synthetic dataset, ... for each γ, we simulate 10,000 patients for training, 1,000 for validation, and 1,000 for testing. ... By setting da = 3 and dy = 2, the cohort of 1,000 patients is split into train/validation/test subsets via a ratio of 60% / 20% / 20 %. ... The train/validation/test subsets are split with the ratio of 70%/15%/15%.
Hardware Specification	Yes	Experiments are carried out on 1 NVIDIA Ge Force RTX 3090 GPU
Software Dependencies	No	The paper mentions various models and architectures (e.g., Mamba, Transformer, LSTM, RNNs) but does not specify any particular software libraries or tools with their version numbers that were used for implementation.
Experiment Setup	Yes	Table 4: Ranges for hyperparameter tuning across experiments. Here, we distinguish (1) data using the tumor growth (TG) simulator (=experiments with fully-synthetic data), (2) data from the semi-synthetic benchmark, and (3) real-world MIMIC-III data. EL refers to the embedding layer, and PL refers to the projection layer. Model Hyperparameter TG simulator Semi-Synthetic Data Real-world Data Mamaba-CDSP Mamba blocks (B) 1 1 2 Learning rate (η) {0.0005, 0.001, 0.01} {0.0005, 0.001, 0.01} {0.0005, 0.001, 0.01} Minibatch size 128 64 64 De-correlation Parameter 1 1 1 EL hidden units (d EL) 32 32 64 PL hidden units (d PL) 32 32 64 Dropout rate (p) 0.1 0.1 0.1 EMA of model weights 0.99 0.99 0.99 Input size da + dx + dy + dv da + dx + dy + dv da + dx + dy + dv Output size dy dy dy