reproducibilityindex.ai

Geometric Exploration for Online Control

Authors: Orestis Plevrakis, Elad Hazan

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	Our main result is a polynomial-time algorithm for the case of known cost function, that achieves n3 T-regret. This is the optimal dependence in the time-horizon and our result improves upon the previous best known bound of T 2/3. The main component of our algorithm is a novel geometric exploration strategy: we adaptively construct a sequence of barycentric spanners in an overparameterized policy space. Second, we consider the case of bandit feedback, for which we give the first polynomial-time algorithm with poly(n) T-regret, building on Stochastic Bandit Convex Optimization.
Researcher Affiliation	Collaboration	Orestis Plevrakis Princeton University orestisp@princeton.edu Elad Hazan Princeton University Google AI Princeton ehazan@princeton.edu
Pseudocode	Yes	Algorithm 1: Geometric Exploration for Control Input: estimates A0, B0 satisfying (A0 B0) (A B ) F ϵ. Initialize policy set M1 = M, matrix estimates ( b A1 b B1) = (A0 B0), disturbance estimates bwt = 0 for t 0. Set d = du dx H. Set t = 1 and observe x1.
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for the described methodology is openly available.
Open Datasets	No	The paper is theoretical and does not conduct experiments on specific datasets, thus no information about public dataset availability is provided.
Dataset Splits	No	The paper is theoretical and does not conduct experiments on specific datasets, thus no information about train/validation/test splits is provided.
Hardware Specification	No	The paper is theoretical and does not describe any specific hardware used for experiments.
Software Dependencies	No	The paper is theoretical and does not mention any specific software dependencies with version numbers for experimental setup.
Experiment Setup	No	The paper is theoretical and does not describe any specific experimental setup details, such as hyperparameters or training configurations.