Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Wasserstein Distributionally Robust Policy Evaluation and Learning for Contextual Bandits

Authors: Yi Shen, Pan Xu, Michael Zavlanos

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We further validate our approach using a public dataset that was recorded in a randomized stoke trial. In our study, we validate the proposed methods using a randomized controlled trial dataset that study the effects of drug treatment on acute ischemic stroke, further demonstrating their effectiveness.
Researcher Affiliation	Academia	Yi Shen EMAIL Duke University Pan Xu EMAIL Duke University Michael M. Zavlanos EMAIL Duke University
Pseudocode	Yes	Algorithm 1 Policy learning using biased stochastic gradient descent
Open Source Code	No	The text does not contain an explicit statement about releasing code or a link to a code repository for the methodology described in this paper.
Open Datasets	Yes	We further validate our approach using a public dataset that was recorded in a randomized stoke trial. We validate our regularized Wasserstein DRO methods for both OPE and OPL problems on the International Stroke Trial (IST) (Group et al., 1997) dataset. The IST dataset (Sandercock et al., 2011) includes 19,435 patients...
Dataset Splits	Yes	To introduce distribution shifts, we split the dataset into a training set and a testing set, and we introduce a selection bias into the training set. Specifically, we randomly remove 50% of the patients in the training set who are not fully conscious.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions commercial linear programming solvers like Gurobi as an example for solving LPs, citing its reference manual from 2023, but it does not specify concrete software dependencies, including library names with version numbers, used for the reported experiments.
Experiment Setup	Yes	The decision trees parameters are cross-validated and shown in Table 4. Table 4: Decision trees parameters Parameters Action 1 Action 2 max depth 4 4 min samples leaf 5 2 score function mean squared error mean squared error