Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Wasserstein Distributionally Robust Policy Evaluation and Learning for Contextual Bandits
Authors: Yi Shen, Pan Xu, Michael Zavlanos
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We further validate our approach using a public dataset that was recorded in a randomized stoke trial. In our study, we validate the proposed methods using a randomized controlled trial dataset that study the effects of drug treatment on acute ischemic stroke, further demonstrating their effectiveness. |
| Researcher Affiliation | Academia | Yi Shen EMAIL Duke University Pan Xu EMAIL Duke University Michael M. Zavlanos EMAIL Duke University |
| Pseudocode | Yes | Algorithm 1 Policy learning using biased stochastic gradient descent |
| Open Source Code | No | The text does not contain an explicit statement about releasing code or a link to a code repository for the methodology described in this paper. |
| Open Datasets | Yes | We further validate our approach using a public dataset that was recorded in a randomized stoke trial. We validate our regularized Wasserstein DRO methods for both OPE and OPL problems on the International Stroke Trial (IST) (Group et al., 1997) dataset. The IST dataset (Sandercock et al., 2011) includes 19,435 patients... |
| Dataset Splits | Yes | To introduce distribution shifts, we split the dataset into a training set and a testing set, and we introduce a selection bias into the training set. Specifically, we randomly remove 50% of the patients in the training set who are not fully conscious. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions commercial linear programming solvers like Gurobi as an example for solving LPs, citing its reference manual from 2023, but it does not specify concrete software dependencies, including library names with version numbers, used for the reported experiments. |
| Experiment Setup | Yes | The decision trees parameters are cross-validated and shown in Table 4. Table 4: Decision trees parameters Parameters Action 1 Action 2 max depth 4 4 min samples leaf 5 2 score function mean squared error mean squared error |