Optimal Rates for Bandit Nonstochastic Control

Authors: Y. Jennifer Sun, Stephen Newman, Elad Hazan

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We answer in the affirmative, giving an algorithm for bandit LQR and LQG which attains optimal regret (up to logarithmic factors) for both known and unknown systems.
Researcher Affiliation Collaboration Y. Jennifer Sun Princeton University Google Deep Mind ys7849@princeton.edu Stephen Newman Princeton University sn9581@princeton.edu Elad Hazan Princeton University & Google Deep Mind ehazan@princeton.edu
Pseudocode Yes Algorithm 1 Ellipsoidal BCO with memory (EBCO-M) Algorithm 2 Ellipsoidal Bandit Perturbation Controller (EBPC) Algorithm 3 System estimation via least squares (Sys Est-LS)
Open Source Code No The paper does not include any explicit statement about releasing open-source code or a link to a code repository for the methodology described.
Open Datasets No The paper is theoretical and does not conduct empirical studies using datasets, hence no information on dataset availability or access is provided.
Dataset Splits No The paper is theoretical and does not involve empirical experiments with datasets, thus no information on training/test/validation splits is provided.
Hardware Specification No The paper is theoretical and does not describe an experimental setup with hardware specifications.
Software Dependencies No The paper is theoretical and does not describe an experimental setup with specific software dependencies or version numbers.
Experiment Setup No The paper is theoretical and does not provide details about an experimental setup, such as hyperparameters or training settings.