Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Non-Stationary Structural Causal Bandits

Authors: Yeahoon Kwon, Yesong Choe, Soungmin Park, Neil Dhir, Sanghack Lee

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results validate the effectiveness of our approach, demonstrating improved performance over myopic baselines. [...] We empirically evaluate POMIS+ across three non-stationary tasks, demonstrating its superiority over the myopic baseline in regret and optimal arm selection by capturing long-term causal effects through temporally-aware interventions ( 8).
Researcher Affiliation	Collaboration	1Graduate School of Data Science, Seoul National University 2Focused Energy Inc. EMAIL EMAIL
Pseudocode	Yes	Algorithm 1 Computing all intervention sequences
Open Source Code	Yes	3A Python implementation can be found at: https://github.com/yeahoon-k/NS-SCMMAB.
Open Datasets	No	We conduct experiments3 on three settings, each designed to highlight different aspects of temporal intervention planning. Detailed specifications for each task are provided in App. I. [...] In this section, we provide detailed specifications of the SCMs used in our experiments to ensure reproducibility.
Dataset Splits	No	In this section, we provide detailed specifications of the SCMs used in our experiments to ensure reproducibility. For each experiment, the simulation was repeated 200 times using the corresponding SCM. [...] Each experimental trial corresponds to a complete causal rollout, where outcomes from all time steps are aggregated into a single episode-level reward.
Hardware Specification	Yes	All experiments were run on a dual-socket Intel Xeon Gold 5317 system with 24 physical cores (48 logical threads) at 3.0GHz.
Software Dependencies	No	3A Python implementation can be found at: https://github.com/yeahoon-k/NS-SCMMAB. [...] We report two metrics cumulative regret (CR) and optimal arm selection probability (OAP) under two MAB solvers: Thompson Sampling (TS) [Thompson, 1933] and KL-UCB [Cappé et al., 2013].
Experiment Setup	Yes	In this section, we provide detailed specifications of the SCMs used in our experiments to ensure reproducibility. For each experiment, the simulation was repeated 200 times using the corresponding SCM. All experiments were run on a dual-socket Intel Xeon Gold 5317 system with 24 physical cores (48 logical threads) at 3.0GHz.