Naive Exploration is Optimal for Online LQR
Authors: Max Simchowitz, Dylan Foster
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We prove new upper and lower bounds demonstrating that the optimal regret scales as eΘ( p d2udx T) |
| Researcher Affiliation | Academia | 1UC Berkeley 2Massachusetts Institute of Technology. Correspondence to: Max Simchowitz <msimchow@berkeley.edu>. |
| Pseudocode | Yes | Our main algorithm, Algorithm 1, is detailed in Appendix H. It is an ε-greedy scheme that takes advantage of this principle. The full pseudocode and analysis are deferred to Appendix H, but we sketch the intuition here. |
| Open Source Code | No | The paper does not provide any explicit statements or links indicating that its source code is open or publicly available. |
| Open Datasets | No | The paper is theoretical and focuses on mathematical proofs and bounds for online LQR. It does not describe experiments run on a specific dataset or provide access information for a public dataset for training. |
| Dataset Splits | No | The paper is theoretical and focuses on mathematical proofs and bounds. It does not describe empirical experiments involving dataset splits for validation. |
| Hardware Specification | No | The paper is theoretical and focuses on mathematical proofs and bounds. It does not describe any specific hardware used for running experiments. |
| Software Dependencies | No | The paper is theoretical and focuses on mathematical proofs and bounds. It does not describe any specific software dependencies with version numbers for experimental reproducibility. |
| Experiment Setup | No | The paper is theoretical and focuses on mathematical proofs and bounds. It does not describe an empirical experimental setup with hyperparameters or training configurations. |