Non-Stochastic Control with Bandit Feedback

Authors: Paula Gradu, John Hallman, Elad Hazan

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We now provide empirical results of our algorithms performance on different dynamical systems and under various noise distributions. In all figures, we average the results obtained over 25 runs and include the corresponding confidence intervals.
Researcher Affiliation Collaboration Paula Gradu1,3 John Hallman1,3 Elad Hazan2,3 1 Department of Mathematics, Princeton University 2 Department of Computer Science, Princeton University 3 Google AI Princeton {pgradu,hallman,ehazan}@princeton.edu
Pseudocode Yes Algorithm 1 BCO with Memory; Algorithm 2 Bandit Perturbation Controller; Algorithm 3 System identification via random inputs; Algorithm 4 BPC with system identification
Open Source Code Yes Our algorithm implementation is available at [26]. [26] Google AI Princeton. Deluca. https://github.com/MinRegret/deluca, 2020.
Open Datasets No The paper uses linear dynamical systems defined by matrices A and B and various synthetic noise specifications (i.i.d Gaussian noise, Sinusoidal noise, Gaussian random walk). It does not provide access information for a publicly available or open dataset.
Dataset Splits No The paper mentions averaging results over 25 runs but does not specify training, validation, or test dataset splits, nor does it discuss cross-validation or other data partitioning methods.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions that 'Our algorithm implementation is available at [26]', but it does not specify any software names with version numbers (e.g., Python version, specific libraries like PyTorch or TensorFlow versions).
Experiment Setup Yes For both BPC and GPC we initialize K to be the infinite-horizon LQR solution given dynamics A and B in all of the settings below in order to observe the improvement provided by the two perturbation controllers relative to the classical approach.