Near-Optimal Reinforcement Learning in Dynamic Treatment Regimes

Authors: Junzhe Zhang, Elias Bareinboim

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our results are validated on randomly generated DTRs and multi-stage clinical trials on cancer treatment. We demonstrate our algorithms on several dynamic treatment regimes, including randomly generated DTRs, and the survival model in the context of multi-stage cancer treatment.
Researcher Affiliation Academia Junzhe Zhang Department of Computer Science Columbia University New York, NY 10027 junzhez@cs.columbia.edu Elias Bareinboim Department of Computer Science Columbia University New York, NY 10027 eb@cs.columbia.edu
Pseudocode Yes Algorithm 1: UC-DTR; Algorithm 2: Causal UC-DTR (UCc-DTR)
Open Source Code No The paper does not provide any explicit statement about releasing source code or a link to a code repository.
Open Datasets Yes We test the survival model of the two-stage clinical trial conducted by the Cancer and Leukemia Group B [16, 37].
Dataset Splits No The paper mentions generating random instances and using data from a clinical trial, but does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts).
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU/CPU models, memory specifications).
Software Dependencies No The paper does not list any specific software dependencies with version numbers.
Experiment Setup Yes Each experiment lasts for T = 1.1 × 10^4 episodes. The parameter δ = 1/KT for uc-dtr and ucc-dtr where K is the total stages of interventions. For all algorithms, we measure their cumulative regret over 200 repetitions. We test... with causal bounds derived from 1 × 10^5 confounded observational samples.