reproducibilityindex.ai

Task-Optimal Exploration in Linear Dynamical Systems

Authors: Andrew J Wagenmaker, Max Simchowitz, Kevin Jamieson

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conclude with several experiments illustrating the effectiveness of our approach in practice. Finally, we show that task-guided exploration yields practical gains. Figures 1, 2, and 3 illustrate the performance of TOPLE on several instances of the pure-exploration LQR problem. We compare against the baselines presented in Section 4 and the oracle task-optimal algorithm (which we refer to as TOPLE Oracle ). For all baselines, we compute the inputs in an oracle, ofﬂine manner, using knowledge of A? and B?, and play them for the entire trajectory. Our implementation of TOPLE follows precisely the formal statement of the algorithm given in Appendix B, and we rely on the aforementioned convex relaxation and a projected gradient descent solution to efﬁciently solve the experiment design problem. This convex relaxation can, in fact, also be applied to the optimal operator norm identiﬁcation algorithm, rendering the algorithm from (Wagenmaker & Jamieson, 2020) computationally efﬁcient. We therefore rely on this relaxation and a projected subgradient descent method in our implementation of the operator norm identiﬁcation algorithm. All data points correspond to averaging over at least 50 runs of the algorithm. Additional details and plots with error bars are provided in Appendix M.
Researcher Affiliation	Academia	1Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, Washington, USA 2Department of Electrical Engineering and Computer Science, University of California, Berkeley, California, USA. Correspondence to: Andrew Wagenmaker <ajwagen@cs.washington.edu>.
Pseudocode	Yes	Algorithm 1 Task-OPtima L Experiment Design (TOPLE), Informal
Open Source Code	No	The paper does not provide any explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	No	The paper focuses on theoretical models and numerical simulations of linear dynamical systems and the LQR problem. It does not explicitly mention using or providing access information for any publicly available or open dataset.
Dataset Splits	No	The paper describes experiments based on simulations and averaging over runs ('All data points correspond to averaging over at least 50 runs of the algorithm.'), but it does not specify training, validation, or test dataset splits.
Hardware Specification	No	The paper describes its implementation and numerical experiments but does not provide any specific details about the hardware used (e.g., GPU models, CPU types, memory specifications).
Software Dependencies	No	The paper mentions using a 'projected gradient descent solution' and a 'projected subgradient descent method' for its implementation, but it does not list any specific software components or libraries with their version numbers.
Experiment Setup	Yes	Figures 2 and 3 illustrate performance on the instances stated in Proposition 4.1 and Proposition 4.2, respectively. Every point in the plot corresponds to the LQR loss obtained after T = 60000 steps. As these plots clearly illustrate, the theoretical gains stated in Proposition 4.1 and Proposition 4.2 appear in practice as well there is a clear improvement in terms of the scaling in and dx when performing task-guided exploration, even over moderate time regimes. Figure 1 illustrates the performance of TOPLE on a more typical problem instance: A? a single Jordan block and B?, Rx, and Ru randomly generated. Figure 1 gives the average loss versus time obtained by averaging the performance over 15 different realizations of B?, Rx, Ru. As in the previous examples, TOPLE outperforms all other approaches. All data points correspond to averaging over at least 50 runs of the algorithm.