Augmented RBMLE-UCB Approach for Adaptive Control of Linear Quadratic Systems
Authors: Akshay Mete, Rahul Singh, P. R. Kumar
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform extensive simulation studies showing that the Augmented RBMLE consistently outperforms UCB, Thompson Sampling and Stab L by a huge margin, while it is marginally better than Input Perturbation and moderately better than Randomized Certainty Equivalence. and 5 Empirical Performance We evaluate the empirical performance of ARBMLE as well as standard (unaugmented) RBMLE. We compare these algorithms with OFULQ [3], Thompson Sampling (TS) [15], Input Perturbations (IE) [16], Randomized Certainty Equivalence (RCE) [10], and Stabl [17]. The results shown here are for the following examples of linear systems that have appeared in the recent literature on adaptive control of linear systems: 1. Unstable Laplacian dynamics [18, 17, 19]. 2. Large transient dynamics [18]. 3. Unmanned Aerial Vehicle (UAV) [20, 17]. 4. Longitudinal Flight Control of Boeing 747 [17]. |
| Researcher Affiliation | Academia | Akshay Mete Texas A & M University College Station, Texas, USA akshaymete@tamu.edu Rahul Singh Indian Institute of Science Bengaluru, Karnataka, India rahulsingh@iisc.ac.in P. R. Kumar Texas A & M University College Station, Texas, USA prk@tamu.edu |
| Pseudocode | Yes | Algorithm 1 Augmented RBMLE-UCB (ARBMLE) Initialize: t = 0, Z0 = λIn+m for k = 0, 1, do if det(Zt) > 2det(Ztk 1) then solve the following optimization to obtain θtk, θt arg min θ S Ctk (δ) {Vtk(θ) + α(tk)J (θ)} , θt = θt 1 end if ut = K(θt)xt Zt+1 = Zt + ztz t t t + 1 end for |
| Open Source Code | No | The paper indicates in its ethics checklist that code is included in supplementary material or via a URL (Question 3a and 4c: Yes). However, the main text does not explicitly provide a URL or state that the source code for the methodology is openly available or where to find it within the paper itself. |
| Open Datasets | Yes | The examples used for our simulation study have been used in many recent papers [18, 19, 17], namely (a) the longitudinal flight control of Boeing 747 with linearized dynamics [17],(b) Unmanned Aerial Vehicle (UAV) [20, 17] (c) unstable Laplacian dynamics [18], and (d) large transient dynamics [18]. |
| Dataset Splits | No | The paper describes running simulations for a time horizon of 500 steps and repeating experiments 50 times for averaging results, but it does not specify training, validation, or test dataset splits in the conventional machine learning sense for data partitioning. The problem setup involves online control rather than static dataset splits. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments or simulations. |
| Software Dependencies | No | The paper does not provide specific software dependencies or their version numbers for reproducibility. |
| Experiment Setup | Yes | Each simulation experiment is performed for a time horizon of 500 steps, and repeated 50 times. The reported results are the averaged values over the 100 runs. and where the bias-term, α(t) = α0/√T, t, for α0 > 0. |