Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Near-Optimal Reinforcement Learning in Dynamic Treatment Regimes
Authors: Junzhe Zhang, Elias Bareinboim
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our results are validated on randomly generated DTRs and multi-stage clinical trials on cancer treatment. We demonstrate our algorithms on several dynamic treatment regimes, including randomly generated DTRs, and the survival model in the context of multi-stage cancer treatment. |
| Researcher Affiliation | Academia | Junzhe Zhang Department of Computer Science Columbia University New York, NY 10027 EMAIL Elias Bareinboim Department of Computer Science Columbia University New York, NY 10027 EMAIL |
| Pseudocode | Yes | Algorithm 1: UC-DTR; Algorithm 2: Causal UC-DTR (UCc-DTR) |
| Open Source Code | No | The paper does not provide any explicit statement about releasing source code or a link to a code repository. |
| Open Datasets | Yes | We test the survival model of the two-stage clinical trial conducted by the Cancer and Leukemia Group B [16, 37]. |
| Dataset Splits | No | The paper mentions generating random instances and using data from a clinical trial, but does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts). |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU/CPU models, memory specifications). |
| Software Dependencies | No | The paper does not list any specific software dependencies with version numbers. |
| Experiment Setup | Yes | Each experiment lasts for T = 1.1 × 10^4 episodes. The parameter δ = 1/KT for uc-dtr and ucc-dtr where K is the total stages of interventions. For all algorithms, we measure their cumulative regret over 200 repetitions. We test... with causal bounds derived from 1 × 10^5 confounded observational samples. |