Robust exploration in linear quadratic reinforcement learning
Authors: Jack Umenberger, Mina Ferizbegovic, Thomas B. Schön, Håkan Hjalmarsson
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical simulations and application to a hardware-in-the-loop servo-mechanism demonstrate the approach, with appreciable performance and robustness gains over alternative methods observed in both. |
| Researcher Affiliation | Academia | Jack Umenberger Department of Information Technology Uppsala University, Sweden jack.umenberger@it.uu.se Mina Ferizbegovic School of Electrical Engineering and Computer Science KTH, Sweden minafe@kth.se Thomas B. Schön Department of Information Technology Uppsala University, Sweden thomas.schon@it.uu.se Håkan Hjalmarsson School of Electrical Engineering and Computer Science KTH, Sweden hjalmars@kth.se |
| Pseudocode | Yes | Algorithm 1 Receding horizon application to true system |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | No | The paper uses data obtained from simulations and a physical servo mechanism, which is custom-generated and not a publicly available dataset with concrete access information. |
| Dataset Splits | No | The paper describes how initial data is obtained and used for trials, but it does not specify explicit training, validation, and test dataset splits. |
| Hardware Specification | Yes | for a hardware-in-the-loop simulation comprised of the interconnection of a physical servo mechanism (Quanser QUBE 2) and a synthetic (simulated) LTI dynamical system. |
| Software Dependencies | No | The paper mentions techniques like convex optimization and semidefinite programing, but it does not specify any particular software libraries, tools, or their version numbers that were used. |
| Experiment Setup | Yes | We partition the time horizon T = 10^3 into N = 10 equally spaced intervals, each of length Ti = 100. For robustness, we set δ = 0.05. with look-ahead horizon h = 10. The total control horizon was T = 1250 (2.5 seconds at 500Hz) and was divided into N = 5 intervals, each of duration 0.5 seconds. |