Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Near-Optimal Reinforcement Learning in Dynamic Treatment Regimes

Authors: Junzhe Zhang, Elias Bareinboim

NeurIPS 2019 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our results are validated on randomly generated DTRs and multi-stage clinical trials on cancer treatment. We demonstrate our algorithms on several dynamic treatment regimes, including randomly generated DTRs, and the survival model in the context of multi-stage cancer treatment.
Researcher Affiliation Academia Junzhe Zhang Department of Computer Science Columbia University New York, NY 10027 EMAIL Elias Bareinboim Department of Computer Science Columbia University New York, NY 10027 EMAIL
Pseudocode Yes Algorithm 1: UC-DTR; Algorithm 2: Causal UC-DTR (UCc-DTR)
Open Source Code No The paper does not provide any explicit statement about releasing source code or a link to a code repository.
Open Datasets Yes We test the survival model of the two-stage clinical trial conducted by the Cancer and Leukemia Group B [16, 37].
Dataset Splits No The paper mentions generating random instances and using data from a clinical trial, but does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts).
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU/CPU models, memory specifications).
Software Dependencies No The paper does not list any specific software dependencies with version numbers.
Experiment Setup Yes Each experiment lasts for T = 1.1 × 10^4 episodes. The parameter δ = 1/KT for uc-dtr and ucc-dtr where K is the total stages of interventions. For all algorithms, we measure their cumulative regret over 200 repetitions. We test... with causal bounds derived from 1 × 10^5 confounded observational samples.