Position: Reinforcement Learning in Dynamic Treatment Regimes Needs Critical Reexamination
Authors: Zhiyao Luo, Yangchen Pan, Peter Watkinson, Tingting Zhu
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through a case study with more than 17,000 evaluation experiments using a publicly available Sepsis dataset, we demonstrate that the performance of RL algorithms can significantly vary with changes in evaluation metrics and Markov Decision Process (MDP) formulations. |
| Researcher Affiliation | Academia | 1Department of Engineering Science, University of Oxford, Parks Road, Oxford OX1 3PJ, United Kingdom 2Nuffield Department of Population Health (NDPH), University of Oxford, Richard Doll Building, Old Road Campus, Headington, Oxford OX3 7LF, United Kingdom |
| Pseudocode | No | No explicit pseudocode or algorithm blocks (e.g., labeled 'Algorithm 1') were found in the paper. |
| Open Source Code | Yes | Code is available at https://github.com/Giles Luo/Reassess DTR. |
| Open Datasets | Yes | The dataset is derived from the Medical Information Mart for Intensive Care III (MIMIC- III) database (Johnson et al., 2016). |
| Dataset Splits | Yes | The data set is divided into training, validation, and testing sets, comprising 70%, 15%, and 15% of the data, respectively. |
| Hardware Specification | No | No specific hardware (e.g., GPU/CPU models, memory, or cloud instance types) used for the experiments was explicitly detailed in the paper. |
| Software Dependencies | No | The paper mentions machine learning techniques and models (e.g., LSTM, DQN, CQL) but does not provide specific version numbers for software dependencies or libraries (e.g., Python, PyTorch/TensorFlow versions). |
| Experiment Setup | Yes | For hyperparameter optimization, grid search is performed in a unified search space. Please see Appendix E.2 for any missing details. |