Non-Stochastic Control with Bandit Feedback
Authors: Paula Gradu, John Hallman, Elad Hazan
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We now provide empirical results of our algorithms performance on different dynamical systems and under various noise distributions. In all figures, we average the results obtained over 25 runs and include the corresponding confidence intervals. |
| Researcher Affiliation | Collaboration | Paula Gradu1,3 John Hallman1,3 Elad Hazan2,3 1 Department of Mathematics, Princeton University 2 Department of Computer Science, Princeton University 3 Google AI Princeton {pgradu,hallman,ehazan}@princeton.edu |
| Pseudocode | Yes | Algorithm 1 BCO with Memory; Algorithm 2 Bandit Perturbation Controller; Algorithm 3 System identiļ¬cation via random inputs; Algorithm 4 BPC with system identiļ¬cation |
| Open Source Code | Yes | Our algorithm implementation is available at [26]. [26] Google AI Princeton. Deluca. https://github.com/MinRegret/deluca, 2020. |
| Open Datasets | No | The paper uses linear dynamical systems defined by matrices A and B and various synthetic noise specifications (i.i.d Gaussian noise, Sinusoidal noise, Gaussian random walk). It does not provide access information for a publicly available or open dataset. |
| Dataset Splits | No | The paper mentions averaging results over 25 runs but does not specify training, validation, or test dataset splits, nor does it discuss cross-validation or other data partitioning methods. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions that 'Our algorithm implementation is available at [26]', but it does not specify any software names with version numbers (e.g., Python version, specific libraries like PyTorch or TensorFlow versions). |
| Experiment Setup | Yes | For both BPC and GPC we initialize K to be the infinite-horizon LQR solution given dynamics A and B in all of the settings below in order to observe the improvement provided by the two perturbation controllers relative to the classical approach. |