reproducibilityindex.ai

Position: Reinforcement Learning in Dynamic Treatment Regimes Needs Critical Reexamination

Authors: Zhiyao Luo, Yangchen Pan, Peter Watkinson, Tingting Zhu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through a case study with more than 17,000 evaluation experiments using a publicly available Sepsis dataset, we demonstrate that the performance of RL algorithms can significantly vary with changes in evaluation metrics and Markov Decision Process (MDP) formulations.
Researcher Affiliation	Academia	1Department of Engineering Science, University of Oxford, Parks Road, Oxford OX1 3PJ, United Kingdom 2Nuffield Department of Population Health (NDPH), University of Oxford, Richard Doll Building, Old Road Campus, Headington, Oxford OX3 7LF, United Kingdom
Pseudocode	No	No explicit pseudocode or algorithm blocks (e.g., labeled 'Algorithm 1') were found in the paper.
Open Source Code	Yes	Code is available at https://github.com/Giles Luo/Reassess DTR.
Open Datasets	Yes	The dataset is derived from the Medical Information Mart for Intensive Care III (MIMIC- III) database (Johnson et al., 2016).
Dataset Splits	Yes	The data set is divided into training, validation, and testing sets, comprising 70%, 15%, and 15% of the data, respectively.
Hardware Specification	No	No specific hardware (e.g., GPU/CPU models, memory, or cloud instance types) used for the experiments was explicitly detailed in the paper.
Software Dependencies	No	The paper mentions machine learning techniques and models (e.g., LSTM, DQN, CQL) but does not provide specific version numbers for software dependencies or libraries (e.g., Python, PyTorch/TensorFlow versions).
Experiment Setup	Yes	For hyperparameter optimization, grid search is performed in a unified search space. Please see Appendix E.2 for any missing details.