Quasi-optimal Reinforcement Learning with Continuous Actions

Authors: Yuhan Li, Wenzhuo Zhou, Ruoqing Zhu

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our algorithm with comprehensive simulated experiments and a dose suggestion real application to Ohio Type 1 diabetes dataset. and Empirical analyses are conducted with comprehensive numerical experiments and a real-world case study, to evaluate the model performance in practice.
Researcher Affiliation Academia Yuhan Li*a Wenzhuo Zhou*b Ruoqing Zhua a University of Illinois Urbana Champaign b University of California Irvine
Pseudocode Yes Algorithm 1 Quasi-optimal Learning in Continuous Action Spaces
Open Source Code Yes We include the reproducible code for all the experiments and the guideline for access to the Ohio Type I Diabetes dataset in Git Hub link https://github.com/liyuhan529/Quasi-optimal-Learning.
Open Datasets Yes Ohio type 1 diabetes (Ohio T1DM) dataset (Marling & Bunescu, 2020), which contains 2 cohorts of patients with Type-1 diabetes... and guideline for access to the Ohio Type I Diabetes dataset in Git Hub link https://github.com/liyuhan529/Quasi-optimal-Learning.
Dataset Splits No The paper mentions validating the cross-validation procedure for selecting the µ parameter and describes how samples are generated or selected, but it does not specify explicit training, validation, and test dataset splits for model evaluation.
Hardware Specification Yes The synthetic experiments are conducted on a single 2.3 GHz Dual-Core Intel Core i5 CPU.
Software Dependencies No The paper mentions using 'Adam (Kingma & Ba, 2014) as the optimizer' and the 'd3rlpy' library (Seno & Imai, 2021), but it does not specify version numbers for these or other software dependencies.
Experiment Setup Yes We set the learning rate αj for the jth iteration is be α0 1+d j , where α0 is the learning rate of the initial iteration, and d is the decay rate of the learning rate. When n = 25, we set the batch size to be 5, and when n = 50, we set the batch size to be 7. We use the L2 distance of iterative parameters as the stopping criterion for the SGD algorithm. The µ selected for each experiment, along with the learning rates and their descent rates, are shown in Table 2 3 and 4.