Thompson Sampling Efficiently Learns to Control Diffusion Processes
Authors: Mohamad Kazem Shirani Faradonbeh, Mohamad Sadegh Shirani Faradonbeh, Mohsen Bayati
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our theoretical results through empirical simulations with real matrices. |
| Researcher Affiliation | Academia | Mohamad Kazem Shirani Faradonbeh Department of Statistics University of Georgia Athens, GA 30602 mohamadksf@uga.edu Mohamad Sadegh Shirani Faradonbeh Graduate School of Business Stanford University Stanford, CA, 94305 sshirani@stanford.edu Mohsen Bayati Graduate School of Business Stanford University Stanford, CA, 94305 bayati@stanford.edu |
| Pseudocode | Yes | Algorithm 1 : Stabilization under Uncertainty (...) Algorithm 2 : Thompson Sampling for Efficient Control of Diffusion Processes |
| Open Source Code | No | The paper states in its ethics checklist that code is included to reproduce results (3a: 'Yes ; See Section 6'). However, Section 6, 'Numerical Analysis', does not provide a direct link to a code repository or explicit instructions on where to find the source code. It only references a 'longer version of the paper [54]' which is another arXiv preprint. |
| Open Datasets | Yes | We empirically evaluate the theoretical results of Theorems 1 and 2 for the flight control of X-29A airplane at 2000 ft [49]. |
| Dataset Splits | No | The paper does not explicitly provide training, validation, or test dataset splits. It discusses simulations for a flight control problem using 'true drift matrices' and episodic learning, rather than traditional dataset splitting. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU or CPU models. The ethics checklist also states 'No' for including compute resources. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers used for the experiments. It describes algorithms and theoretical foundations, and presents numerical analysis without specifying the software environment. |
| Experiment Setup | Yes | Further, we let ΣW = 0.25 Ip, Qx = Ip, and Qu = 0.1 Iq where In is the n by n identity matrix. To update the diffusion process xt in (1), time-steps of length 10 3 are employed. Then, in Algorithm 1, we let σw = 5, κ = τ 3/2 , while τ varies from 4 to 20 seconds. The initial feedback K is generated randomly. (...) On the right hand side of Figure 1, Algorithm 2 is executed for 600 second, for τ n = 20 1.1n. We compare TS with the Randomized Estimate algorithm [2] for 100 different repetitions. |