Quasi-optimal Reinforcement Learning with Continuous Actions
Authors: Yuhan Li, Wenzhuo Zhou, Ruoqing Zhu
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our algorithm with comprehensive simulated experiments and a dose suggestion real application to Ohio Type 1 diabetes dataset. and Empirical analyses are conducted with comprehensive numerical experiments and a real-world case study, to evaluate the model performance in practice. |
| Researcher Affiliation | Academia | Yuhan Li*a Wenzhuo Zhou*b Ruoqing Zhua a University of Illinois Urbana Champaign b University of California Irvine |
| Pseudocode | Yes | Algorithm 1 Quasi-optimal Learning in Continuous Action Spaces |
| Open Source Code | Yes | We include the reproducible code for all the experiments and the guideline for access to the Ohio Type I Diabetes dataset in Git Hub link https://github.com/liyuhan529/Quasi-optimal-Learning. |
| Open Datasets | Yes | Ohio type 1 diabetes (Ohio T1DM) dataset (Marling & Bunescu, 2020), which contains 2 cohorts of patients with Type-1 diabetes... and guideline for access to the Ohio Type I Diabetes dataset in Git Hub link https://github.com/liyuhan529/Quasi-optimal-Learning. |
| Dataset Splits | No | The paper mentions validating the cross-validation procedure for selecting the µ parameter and describes how samples are generated or selected, but it does not specify explicit training, validation, and test dataset splits for model evaluation. |
| Hardware Specification | Yes | The synthetic experiments are conducted on a single 2.3 GHz Dual-Core Intel Core i5 CPU. |
| Software Dependencies | No | The paper mentions using 'Adam (Kingma & Ba, 2014) as the optimizer' and the 'd3rlpy' library (Seno & Imai, 2021), but it does not specify version numbers for these or other software dependencies. |
| Experiment Setup | Yes | We set the learning rate αj for the jth iteration is be α0 1+d j , where α0 is the learning rate of the initial iteration, and d is the decay rate of the learning rate. When n = 25, we set the batch size to be 5, and when n = 50, we set the batch size to be 7. We use the L2 distance of iterative parameters as the stopping criterion for the SGD algorithm. The µ selected for each experiment, along with the learning rates and their descent rates, are shown in Table 2 3 and 4. |