Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Non-Stationary Structural Causal Bandits

Authors: Yeahoon Kwon, Yesong Choe, Soungmin Park, Neil Dhir, Sanghack Lee

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results validate the effectiveness of our approach, demonstrating improved performance over myopic baselines. [...] We empirically evaluate POMIS+ across three non-stationary tasks, demonstrating its superiority over the myopic baseline in regret and optimal arm selection by capturing long-term causal effects through temporally-aware interventions ( 8).
Researcher Affiliation Collaboration 1Graduate School of Data Science, Seoul National University 2Focused Energy Inc. EMAIL EMAIL
Pseudocode Yes Algorithm 1 Computing all intervention sequences
Open Source Code Yes 3A Python implementation can be found at: https://github.com/yeahoon-k/NS-SCMMAB.
Open Datasets No We conduct experiments3 on three settings, each designed to highlight different aspects of temporal intervention planning. Detailed specifications for each task are provided in App. I. [...] In this section, we provide detailed specifications of the SCMs used in our experiments to ensure reproducibility.
Dataset Splits No In this section, we provide detailed specifications of the SCMs used in our experiments to ensure reproducibility. For each experiment, the simulation was repeated 200 times using the corresponding SCM. [...] Each experimental trial corresponds to a complete causal rollout, where outcomes from all time steps are aggregated into a single episode-level reward.
Hardware Specification Yes All experiments were run on a dual-socket Intel Xeon Gold 5317 system with 24 physical cores (48 logical threads) at 3.0GHz.
Software Dependencies No 3A Python implementation can be found at: https://github.com/yeahoon-k/NS-SCMMAB. [...] We report two metrics cumulative regret (CR) and optimal arm selection probability (OAP) under two MAB solvers: Thompson Sampling (TS) [Thompson, 1933] and KL-UCB [Cappé et al., 2013].
Experiment Setup Yes In this section, we provide detailed specifications of the SCMs used in our experiments to ensure reproducibility. For each experiment, the simulation was repeated 200 times using the corresponding SCM. All experiments were run on a dual-socket Intel Xeon Gold 5317 system with 24 physical cores (48 logical threads) at 3.0GHz.