reproducibilityindex.ai

Robust exploration in linear quadratic reinforcement learning

Authors: Jack Umenberger, Mina Ferizbegovic, Thomas B. Schön, Håkan Hjalmarsson

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical simulations and application to a hardware-in-the-loop servo-mechanism demonstrate the approach, with appreciable performance and robustness gains over alternative methods observed in both.
Researcher Affiliation	Academia	Jack Umenberger Department of Information Technology Uppsala University, Sweden jack.umenberger@it.uu.se Mina Ferizbegovic School of Electrical Engineering and Computer Science KTH, Sweden minafe@kth.se Thomas B. Schön Department of Information Technology Uppsala University, Sweden thomas.schon@it.uu.se Håkan Hjalmarsson School of Electrical Engineering and Computer Science KTH, Sweden hjalmars@kth.se
Pseudocode	Yes	Algorithm 1 Receding horizon application to true system
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	No	The paper uses data obtained from simulations and a physical servo mechanism, which is custom-generated and not a publicly available dataset with concrete access information.
Dataset Splits	No	The paper describes how initial data is obtained and used for trials, but it does not specify explicit training, validation, and test dataset splits.
Hardware Specification	Yes	for a hardware-in-the-loop simulation comprised of the interconnection of a physical servo mechanism (Quanser QUBE 2) and a synthetic (simulated) LTI dynamical system.
Software Dependencies	No	The paper mentions techniques like convex optimization and semideﬁnite programing, but it does not specify any particular software libraries, tools, or their version numbers that were used.
Experiment Setup	Yes	We partition the time horizon T = 10^3 into N = 10 equally spaced intervals, each of length Ti = 100. For robustness, we set δ = 0.05. with look-ahead horizon h = 10. The total control horizon was T = 1250 (2.5 seconds at 500Hz) and was divided into N = 5 intervals, each of duration 0.5 seconds.