Geometric Exploration for Online Control

Authors: Orestis Plevrakis, Elad Hazan

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical Our main result is a polynomial-time algorithm for the case of known cost function, that achieves n3 T-regret. This is the optimal dependence in the time-horizon and our result improves upon the previous best known bound of T 2/3. The main component of our algorithm is a novel geometric exploration strategy: we adaptively construct a sequence of barycentric spanners in an overparameterized policy space. Second, we consider the case of bandit feedback, for which we give the first polynomial-time algorithm with poly(n) T-regret, building on Stochastic Bandit Convex Optimization.
Researcher Affiliation Collaboration Orestis Plevrakis Princeton University orestisp@princeton.edu Elad Hazan Princeton University Google AI Princeton ehazan@princeton.edu
Pseudocode Yes Algorithm 1: Geometric Exploration for Control Input: estimates A0, B0 satisfying (A0 B0) (A B ) F ϵ. Initialize policy set M1 = M, matrix estimates ( b A1 b B1) = (A0 B0), disturbance estimates bwt = 0 for t 0. Set d = du dx H. Set t = 1 and observe x1.
Open Source Code No The paper does not provide any statement or link indicating that the source code for the described methodology is openly available.
Open Datasets No The paper is theoretical and does not conduct experiments on specific datasets, thus no information about public dataset availability is provided.
Dataset Splits No The paper is theoretical and does not conduct experiments on specific datasets, thus no information about train/validation/test splits is provided.
Hardware Specification No The paper is theoretical and does not describe any specific hardware used for experiments.
Software Dependencies No The paper is theoretical and does not mention any specific software dependencies with version numbers for experimental setup.
Experiment Setup No The paper is theoretical and does not describe any specific experimental setup details, such as hyperparameters or training configurations.