Naive Exploration is Optimal for Online LQR

Authors: Max Simchowitz, Dylan Foster

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We prove new upper and lower bounds demonstrating that the optimal regret scales as eΘ( p d2udx T)
Researcher Affiliation Academia 1UC Berkeley 2Massachusetts Institute of Technology. Correspondence to: Max Simchowitz <msimchow@berkeley.edu>.
Pseudocode Yes Our main algorithm, Algorithm 1, is detailed in Appendix H. It is an ε-greedy scheme that takes advantage of this principle. The full pseudocode and analysis are deferred to Appendix H, but we sketch the intuition here.
Open Source Code No The paper does not provide any explicit statements or links indicating that its source code is open or publicly available.
Open Datasets No The paper is theoretical and focuses on mathematical proofs and bounds for online LQR. It does not describe experiments run on a specific dataset or provide access information for a public dataset for training.
Dataset Splits No The paper is theoretical and focuses on mathematical proofs and bounds. It does not describe empirical experiments involving dataset splits for validation.
Hardware Specification No The paper is theoretical and focuses on mathematical proofs and bounds. It does not describe any specific hardware used for running experiments.
Software Dependencies No The paper is theoretical and focuses on mathematical proofs and bounds. It does not describe any specific software dependencies with version numbers for experimental reproducibility.
Experiment Setup No The paper is theoretical and focuses on mathematical proofs and bounds. It does not describe an empirical experimental setup with hyperparameters or training configurations.