reproducibilityindex.ai

Quasi-optimal Reinforcement Learning with Continuous Actions

Authors: Yuhan Li, Wenzhuo Zhou, Ruoqing Zhu

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our algorithm with comprehensive simulated experiments and a dose suggestion real application to Ohio Type 1 diabetes dataset. and Empirical analyses are conducted with comprehensive numerical experiments and a real-world case study, to evaluate the model performance in practice.
Researcher Affiliation	Academia	Yuhan Lia Wenzhuo Zhoub Ruoqing Zhua a University of Illinois Urbana Champaign b University of California Irvine
Pseudocode	Yes	Algorithm 1 Quasi-optimal Learning in Continuous Action Spaces
Open Source Code	Yes	We include the reproducible code for all the experiments and the guideline for access to the Ohio Type I Diabetes dataset in Git Hub link https://github.com/liyuhan529/Quasi-optimal-Learning.
Open Datasets	Yes	Ohio type 1 diabetes (Ohio T1DM) dataset (Marling & Bunescu, 2020), which contains 2 cohorts of patients with Type-1 diabetes... and guideline for access to the Ohio Type I Diabetes dataset in Git Hub link https://github.com/liyuhan529/Quasi-optimal-Learning.
Dataset Splits	No	The paper mentions validating the cross-validation procedure for selecting the µ parameter and describes how samples are generated or selected, but it does not specify explicit training, validation, and test dataset splits for model evaluation.
Hardware Specification	Yes	The synthetic experiments are conducted on a single 2.3 GHz Dual-Core Intel Core i5 CPU.
Software Dependencies	No	The paper mentions using 'Adam (Kingma & Ba, 2014) as the optimizer' and the 'd3rlpy' library (Seno & Imai, 2021), but it does not specify version numbers for these or other software dependencies.
Experiment Setup	Yes	We set the learning rate αj for the jth iteration is be α0 1+d j , where α0 is the learning rate of the initial iteration, and d is the decay rate of the learning rate. When n = 25, we set the batch size to be 5, and when n = 50, we set the batch size to be 7. We use the L2 distance of iterative parameters as the stopping criterion for the SGD algorithm. The µ selected for each experiment, along with the learning rates and their descent rates, are shown in Table 2 3 and 4.