Augmented RBMLE-UCB Approach for Adaptive Control of Linear Quadratic Systems

Authors: Akshay Mete, Rahul Singh, P. R. Kumar

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform extensive simulation studies showing that the Augmented RBMLE consistently outperforms UCB, Thompson Sampling and Stab L by a huge margin, while it is marginally better than Input Perturbation and moderately better than Randomized Certainty Equivalence. and 5 Empirical Performance We evaluate the empirical performance of ARBMLE as well as standard (unaugmented) RBMLE. We compare these algorithms with OFULQ [3], Thompson Sampling (TS) [15], Input Perturbations (IE) [16], Randomized Certainty Equivalence (RCE) [10], and Stabl [17]. The results shown here are for the following examples of linear systems that have appeared in the recent literature on adaptive control of linear systems: 1. Unstable Laplacian dynamics [18, 17, 19]. 2. Large transient dynamics [18]. 3. Unmanned Aerial Vehicle (UAV) [20, 17]. 4. Longitudinal Flight Control of Boeing 747 [17].
Researcher Affiliation Academia Akshay Mete Texas A & M University College Station, Texas, USA akshaymete@tamu.edu Rahul Singh Indian Institute of Science Bengaluru, Karnataka, India rahulsingh@iisc.ac.in P. R. Kumar Texas A & M University College Station, Texas, USA prk@tamu.edu
Pseudocode Yes Algorithm 1 Augmented RBMLE-UCB (ARBMLE) Initialize: t = 0, Z0 = λIn+m for k = 0, 1, do if det(Zt) > 2det(Ztk 1) then solve the following optimization to obtain θtk, θt arg min θ S Ctk (δ) {Vtk(θ) + α(tk)J (θ)} , θt = θt 1 end if ut = K(θt)xt Zt+1 = Zt + ztz t t t + 1 end for
Open Source Code No The paper indicates in its ethics checklist that code is included in supplementary material or via a URL (Question 3a and 4c: Yes). However, the main text does not explicitly provide a URL or state that the source code for the methodology is openly available or where to find it within the paper itself.
Open Datasets Yes The examples used for our simulation study have been used in many recent papers [18, 19, 17], namely (a) the longitudinal flight control of Boeing 747 with linearized dynamics [17],(b) Unmanned Aerial Vehicle (UAV) [20, 17] (c) unstable Laplacian dynamics [18], and (d) large transient dynamics [18].
Dataset Splits No The paper describes running simulations for a time horizon of 500 steps and repeating experiments 50 times for averaging results, but it does not specify training, validation, or test dataset splits in the conventional machine learning sense for data partitioning. The problem setup involves online control rather than static dataset splits.
Hardware Specification No The paper does not provide any specific details about the hardware used for running the experiments or simulations.
Software Dependencies No The paper does not provide specific software dependencies or their version numbers for reproducibility.
Experiment Setup Yes Each simulation experiment is performed for a time horizon of 500 steps, and repeated 50 times. The reported results are the averaged values over the 100 runs. and where the bias-term, α(t) = α0/√T, t, for α0 > 0.