Robust exploration in linear quadratic reinforcement learning

Authors: Jack Umenberger, Mina Ferizbegovic, Thomas B. Schön, Håkan Hjalmarsson

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical simulations and application to a hardware-in-the-loop servo-mechanism demonstrate the approach, with appreciable performance and robustness gains over alternative methods observed in both.
Researcher Affiliation Academia Jack Umenberger Department of Information Technology Uppsala University, Sweden jack.umenberger@it.uu.se Mina Ferizbegovic School of Electrical Engineering and Computer Science KTH, Sweden minafe@kth.se Thomas B. Schön Department of Information Technology Uppsala University, Sweden thomas.schon@it.uu.se Håkan Hjalmarsson School of Electrical Engineering and Computer Science KTH, Sweden hjalmars@kth.se
Pseudocode Yes Algorithm 1 Receding horizon application to true system
Open Source Code No The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets No The paper uses data obtained from simulations and a physical servo mechanism, which is custom-generated and not a publicly available dataset with concrete access information.
Dataset Splits No The paper describes how initial data is obtained and used for trials, but it does not specify explicit training, validation, and test dataset splits.
Hardware Specification Yes for a hardware-in-the-loop simulation comprised of the interconnection of a physical servo mechanism (Quanser QUBE 2) and a synthetic (simulated) LTI dynamical system.
Software Dependencies No The paper mentions techniques like convex optimization and semidefinite programing, but it does not specify any particular software libraries, tools, or their version numbers that were used.
Experiment Setup Yes We partition the time horizon T = 10^3 into N = 10 equally spaced intervals, each of length Ti = 100. For robustness, we set δ = 0.05. with look-ahead horizon h = 10. The total control horizon was T = 1250 (2.5 seconds at 500Hz) and was divided into N = 5 intervals, each of duration 0.5 seconds.