Task-Optimal Exploration in Linear Dynamical Systems
Authors: Andrew J Wagenmaker, Max Simchowitz, Kevin Jamieson
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conclude with several experiments illustrating the effectiveness of our approach in practice. Finally, we show that task-guided exploration yields practical gains. Figures 1, 2, and 3 illustrate the performance of TOPLE on several instances of the pure-exploration LQR problem. We compare against the baselines presented in Section 4 and the oracle task-optimal algorithm (which we refer to as TOPLE Oracle ). For all baselines, we compute the inputs in an oracle, offline manner, using knowledge of A? and B?, and play them for the entire trajectory. Our implementation of TOPLE follows precisely the formal statement of the algorithm given in Appendix B, and we rely on the aforementioned convex relaxation and a projected gradient descent solution to efficiently solve the experiment design problem. This convex relaxation can, in fact, also be applied to the optimal operator norm identification algorithm, rendering the algorithm from (Wagenmaker & Jamieson, 2020) computationally efficient. We therefore rely on this relaxation and a projected subgradient descent method in our implementation of the operator norm identification algorithm. All data points correspond to averaging over at least 50 runs of the algorithm. Additional details and plots with error bars are provided in Appendix M. |
| Researcher Affiliation | Academia | 1Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, Washington, USA 2Department of Electrical Engineering and Computer Science, University of California, Berkeley, California, USA. Correspondence to: Andrew Wagenmaker <ajwagen@cs.washington.edu>. |
| Pseudocode | Yes | Algorithm 1 Task-OPtima L Experiment Design (TOPLE), Informal |
| Open Source Code | No | The paper does not provide any explicit statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | No | The paper focuses on theoretical models and numerical simulations of linear dynamical systems and the LQR problem. It does not explicitly mention using or providing access information for any publicly available or open dataset. |
| Dataset Splits | No | The paper describes experiments based on simulations and averaging over runs ('All data points correspond to averaging over at least 50 runs of the algorithm.'), but it does not specify training, validation, or test dataset splits. |
| Hardware Specification | No | The paper describes its implementation and numerical experiments but does not provide any specific details about the hardware used (e.g., GPU models, CPU types, memory specifications). |
| Software Dependencies | No | The paper mentions using a 'projected gradient descent solution' and a 'projected subgradient descent method' for its implementation, but it does not list any specific software components or libraries with their version numbers. |
| Experiment Setup | Yes | Figures 2 and 3 illustrate performance on the instances stated in Proposition 4.1 and Proposition 4.2, respectively. Every point in the plot corresponds to the LQR loss obtained after T = 60000 steps. As these plots clearly illustrate, the theoretical gains stated in Proposition 4.1 and Proposition 4.2 appear in practice as well there is a clear improvement in terms of the scaling in and dx when performing task-guided exploration, even over moderate time regimes. Figure 1 illustrates the performance of TOPLE on a more typical problem instance: A? a single Jordan block and B?, Rx, and Ru randomly generated. Figure 1 gives the average loss versus time obtained by averaging the performance over 15 different realizations of B?, Rx, Ru. As in the previous examples, TOPLE outperforms all other approaches. All data points correspond to averaging over at least 50 runs of the algorithm. |