Near-Optimal Reinforcement Learning in Dynamic Treatment Regimes
Authors: Junzhe Zhang, Elias Bareinboim
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our results are validated on randomly generated DTRs and multi-stage clinical trials on cancer treatment. We demonstrate our algorithms on several dynamic treatment regimes, including randomly generated DTRs, and the survival model in the context of multi-stage cancer treatment. |
| Researcher Affiliation | Academia | Junzhe Zhang Department of Computer Science Columbia University New York, NY 10027 junzhez@cs.columbia.edu Elias Bareinboim Department of Computer Science Columbia University New York, NY 10027 eb@cs.columbia.edu |
| Pseudocode | Yes | Algorithm 1: UC-DTR; Algorithm 2: Causal UC-DTR (UCc-DTR) |
| Open Source Code | No | The paper does not provide any explicit statement about releasing source code or a link to a code repository. |
| Open Datasets | Yes | We test the survival model of the two-stage clinical trial conducted by the Cancer and Leukemia Group B [16, 37]. |
| Dataset Splits | No | The paper mentions generating random instances and using data from a clinical trial, but does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts). |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU/CPU models, memory specifications). |
| Software Dependencies | No | The paper does not list any specific software dependencies with version numbers. |
| Experiment Setup | Yes | Each experiment lasts for T = 1.1 × 10^4 episodes. The parameter δ = 1/KT for uc-dtr and ucc-dtr where K is the total stages of interventions. For all algorithms, we measure their cumulative regret over 200 repetitions. We test... with causal bounds derived from 1 × 10^5 confounded observational samples. |