Improved Regret Bounds for Thompson Sampling in Linear Quadratic Control Problems
Authors: Marc Abeille, Alessandro Lazaric
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we report numerical simulations supporting the conjecture that our result extends to multi-dimensional systems. and Numerical simulations. Since all the results in Sect. 5 hold for n 1, we try to simulate several random LQ systems of variable dimensionality and numerically estimate the probability of being optimistic (popt) in each of them. |
| Researcher Affiliation | Industry | 1Criteo, Paris, France 2Facebook AI Research, Paris, France. |
| Pseudocode | Yes | Figure 1: Thompson sampling algorithm for LQR |
| Open Source Code | No | The paper does not provide any statement or link for the open-source code of the described methodology. |
| Open Datasets | No | The numerical simulations describe generating random LQ systems rather than using a pre-existing, publicly available dataset with concrete access information. |
| Dataset Splits | No | The paper does not describe specific dataset splits (e.g., training, validation, test percentages or counts) as it focuses on theoretical analysis and simulations of system parameters, not traditional dataset-based experiments. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the numerical simulations. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers for reproducing the experiments. |
| Experiment Setup | Yes | We construct S by setting D = 20J(θ ). For different values of n and d, we sample θ as θ [i, j] N(0, 1) independently and we run multiple trajectories of TS of length T = 500 steps. At each step t, we sample 1000 eθt from the TS distribution (with rejection)... |