Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Structural Causal Bandits under Markov Equivalence

Authors: Min Woo Park, Andy Arditi, Elias Bareinboim, Sanghack Lee

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the cumulative regrets (CR) of SCM-MAB under different strategies to assess the effect of employing POMIS for PAGs (Fig. 7). The number of trials is set to 10,000 for Tasks 1 and 2, and 5,000 for Task 3, which is sufficient to observe performance differences. Each simulation is repeated 1,000 times to obtain consistent results.
Researcher Affiliation	Academia	Min Woo Park1 Andy Arditi2 Elias Bareinboim2 Sanghack Lee1 1Seoul National University 2Columbia University EMAIL EMAIL EMAIL EMAIL
Pseudocode	Yes	Algorithm 1: Identify whether a given set is a POMIS for PAG. 1 function Is POMIS(P, Y , X) Input: P: PAG, Y : reward, X: Intervention set 2 if given X does not satisfy Thm. 2 then return False 3 Let QX be a PMG oriented from P with X according to Prop. 7. 4 return sub Is POMIS(QX, X {Y }, Y , X) 5 function sub Is POMIS(Q, A, Y , X) 6 if A is empty then return IB(Q, Y , X) = X 7 A Pick a node from A. 8 for each set CQ A {V Adj(A)Q \| A V } do 9 if CQ A satisfies Thm. 7 (i.e., check validity of local transformation) and Y Poss De(A)Q\CQ A then 10 Let Q be the PMG obtained by orienting the circle marks around A following CQ A and completing the orientation rules from Q. 11 if sub Is POMIS(Q , A \ {A}, Y , X) then return True 12 return False
Open Source Code	Yes	5. Open access to data and code Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: The provided code includes sufficient instructions to reproduce all of our results.
Open Datasets	No	The underlying model mechanisms are randomly generated by combining binary logical operations, and the exogenous variables are set to follow Bernoulli distributions whose parameters are randomly selected over (0, 1).
Dataset Splits	No	The number of trials is set to 10,000 for Tasks 1, 2, and 4; 5,000 for Task 3; and 2,000 for Tasks 5 and 6, which is sufficient to observe performance differences among action spaces. The number of trials is selected such that the cumulative regret with respect to POMIS stabilizes across 1000 repeated runs. Each simulation is repeated 1,000 times to obtain consistent results.
Hardware Specification	Yes	The simulations were conducted on a Linux server equipped with an Intel Xeon Gold 5317 processor running at 3.0 GHz and 64 GB of RAM. No GPUs were used during the simulations.
Software Dependencies	No	We compare three arm-selection strategies: POMISs (pink), DMISs (purple), and Brute-force (BF; green), each combined with two prominent solvers: Thompson Sampling (TS) and KL-UCB.
Experiment Setup	Yes	The underlying model mechanisms are randomly generated by combining binary logical operations, and the exogenous variables are set to follow Bernoulli distributions whose parameters are randomly selected over (0, 1). ... The number of trials is set to 10,000 for Tasks 1, 2, and 4; 5,000 for Task 3; and 2,000 for Tasks 5 and 6...