Geometric Exploration for Online Control
Authors: Orestis Plevrakis, Elad Hazan
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Our main result is a polynomial-time algorithm for the case of known cost function, that achieves n3 T-regret. This is the optimal dependence in the time-horizon and our result improves upon the previous best known bound of T 2/3. The main component of our algorithm is a novel geometric exploration strategy: we adaptively construct a sequence of barycentric spanners in an overparameterized policy space. Second, we consider the case of bandit feedback, for which we give the first polynomial-time algorithm with poly(n) T-regret, building on Stochastic Bandit Convex Optimization. |
| Researcher Affiliation | Collaboration | Orestis Plevrakis Princeton University orestisp@princeton.edu Elad Hazan Princeton University Google AI Princeton ehazan@princeton.edu |
| Pseudocode | Yes | Algorithm 1: Geometric Exploration for Control Input: estimates A0, B0 satisfying (A0 B0) (A B ) F ϵ. Initialize policy set M1 = M, matrix estimates ( b A1 b B1) = (A0 B0), disturbance estimates bwt = 0 for t 0. Set d = du dx H. Set t = 1 and observe x1. |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the described methodology is openly available. |
| Open Datasets | No | The paper is theoretical and does not conduct experiments on specific datasets, thus no information about public dataset availability is provided. |
| Dataset Splits | No | The paper is theoretical and does not conduct experiments on specific datasets, thus no information about train/validation/test splits is provided. |
| Hardware Specification | No | The paper is theoretical and does not describe any specific hardware used for experiments. |
| Software Dependencies | No | The paper is theoretical and does not mention any specific software dependencies with version numbers for experimental setup. |
| Experiment Setup | No | The paper is theoretical and does not describe any specific experimental setup details, such as hyperparameters or training configurations. |