Task-Optimal Exploration in Linear Dynamical Systems

Authors: Andrew J Wagenmaker, Max Simchowitz, Kevin Jamieson

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conclude with several experiments illustrating the effectiveness of our approach in practice. Finally, we show that task-guided exploration yields practical gains. Figures 1, 2, and 3 illustrate the performance of TOPLE on several instances of the pure-exploration LQR problem. We compare against the baselines presented in Section 4 and the oracle task-optimal algorithm (which we refer to as TOPLE Oracle ). For all baselines, we compute the inputs in an oracle, offline manner, using knowledge of A? and B?, and play them for the entire trajectory. Our implementation of TOPLE follows precisely the formal statement of the algorithm given in Appendix B, and we rely on the aforementioned convex relaxation and a projected gradient descent solution to efficiently solve the experiment design problem. This convex relaxation can, in fact, also be applied to the optimal operator norm identification algorithm, rendering the algorithm from (Wagenmaker & Jamieson, 2020) computationally efficient. We therefore rely on this relaxation and a projected subgradient descent method in our implementation of the operator norm identification algorithm. All data points correspond to averaging over at least 50 runs of the algorithm. Additional details and plots with error bars are provided in Appendix M.
Researcher Affiliation Academia 1Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, Washington, USA 2Department of Electrical Engineering and Computer Science, University of California, Berkeley, California, USA. Correspondence to: Andrew Wagenmaker <ajwagen@cs.washington.edu>.
Pseudocode Yes Algorithm 1 Task-OPtima L Experiment Design (TOPLE), Informal
Open Source Code No The paper does not provide any explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets No The paper focuses on theoretical models and numerical simulations of linear dynamical systems and the LQR problem. It does not explicitly mention using or providing access information for any publicly available or open dataset.
Dataset Splits No The paper describes experiments based on simulations and averaging over runs ('All data points correspond to averaging over at least 50 runs of the algorithm.'), but it does not specify training, validation, or test dataset splits.
Hardware Specification No The paper describes its implementation and numerical experiments but does not provide any specific details about the hardware used (e.g., GPU models, CPU types, memory specifications).
Software Dependencies No The paper mentions using a 'projected gradient descent solution' and a 'projected subgradient descent method' for its implementation, but it does not list any specific software components or libraries with their version numbers.
Experiment Setup Yes Figures 2 and 3 illustrate performance on the instances stated in Proposition 4.1 and Proposition 4.2, respectively. Every point in the plot corresponds to the LQR loss obtained after T = 60000 steps. As these plots clearly illustrate, the theoretical gains stated in Proposition 4.1 and Proposition 4.2 appear in practice as well there is a clear improvement in terms of the scaling in and dx when performing task-guided exploration, even over moderate time regimes. Figure 1 illustrates the performance of TOPLE on a more typical problem instance: A? a single Jordan block and B?, Rx, and Ru randomly generated. Figure 1 gives the average loss versus time obtained by averaging the performance over 15 different realizations of B?, Rx, Ru. As in the previous examples, TOPLE outperforms all other approaches. All data points correspond to averaging over at least 50 runs of the algorithm.