Position: Reinforcement Learning in Dynamic Treatment Regimes Needs Critical Reexamination

Authors: Zhiyao Luo, Yangchen Pan, Peter Watkinson, Tingting Zhu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through a case study with more than 17,000 evaluation experiments using a publicly available Sepsis dataset, we demonstrate that the performance of RL algorithms can significantly vary with changes in evaluation metrics and Markov Decision Process (MDP) formulations.
Researcher Affiliation Academia 1Department of Engineering Science, University of Oxford, Parks Road, Oxford OX1 3PJ, United Kingdom 2Nuffield Department of Population Health (NDPH), University of Oxford, Richard Doll Building, Old Road Campus, Headington, Oxford OX3 7LF, United Kingdom
Pseudocode No No explicit pseudocode or algorithm blocks (e.g., labeled 'Algorithm 1') were found in the paper.
Open Source Code Yes Code is available at https://github.com/Giles Luo/Reassess DTR.
Open Datasets Yes The dataset is derived from the Medical Information Mart for Intensive Care III (MIMIC- III) database (Johnson et al., 2016).
Dataset Splits Yes The data set is divided into training, validation, and testing sets, comprising 70%, 15%, and 15% of the data, respectively.
Hardware Specification No No specific hardware (e.g., GPU/CPU models, memory, or cloud instance types) used for the experiments was explicitly detailed in the paper.
Software Dependencies No The paper mentions machine learning techniques and models (e.g., LSTM, DQN, CQL) but does not provide specific version numbers for software dependencies or libraries (e.g., Python, PyTorch/TensorFlow versions).
Experiment Setup Yes For hyperparameter optimization, grid search is performed in a unified search space. Please see Appendix E.2 for any missing details.