Thompson Sampling Efficiently Learns to Control Diffusion Processes

Authors: Mohamad Kazem Shirani Faradonbeh, Mohamad Sadegh Shirani Faradonbeh, Mohsen Bayati

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our theoretical results through empirical simulations with real matrices.
Researcher Affiliation Academia Mohamad Kazem Shirani Faradonbeh Department of Statistics University of Georgia Athens, GA 30602 mohamadksf@uga.edu Mohamad Sadegh Shirani Faradonbeh Graduate School of Business Stanford University Stanford, CA, 94305 sshirani@stanford.edu Mohsen Bayati Graduate School of Business Stanford University Stanford, CA, 94305 bayati@stanford.edu
Pseudocode Yes Algorithm 1 : Stabilization under Uncertainty (...) Algorithm 2 : Thompson Sampling for Efficient Control of Diffusion Processes
Open Source Code No The paper states in its ethics checklist that code is included to reproduce results (3a: 'Yes ; See Section 6'). However, Section 6, 'Numerical Analysis', does not provide a direct link to a code repository or explicit instructions on where to find the source code. It only references a 'longer version of the paper [54]' which is another arXiv preprint.
Open Datasets Yes We empirically evaluate the theoretical results of Theorems 1 and 2 for the flight control of X-29A airplane at 2000 ft [49].
Dataset Splits No The paper does not explicitly provide training, validation, or test dataset splits. It discusses simulations for a flight control problem using 'true drift matrices' and episodic learning, rather than traditional dataset splitting.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as GPU or CPU models. The ethics checklist also states 'No' for including compute resources.
Software Dependencies No The paper does not list specific software dependencies with version numbers used for the experiments. It describes algorithms and theoretical foundations, and presents numerical analysis without specifying the software environment.
Experiment Setup Yes Further, we let ΣW = 0.25 Ip, Qx = Ip, and Qu = 0.1 Iq where In is the n by n identity matrix. To update the diffusion process xt in (1), time-steps of length 10 3 are employed. Then, in Algorithm 1, we let σw = 5, κ = τ 3/2 , while τ varies from 4 to 20 seconds. The initial feedback K is generated randomly. (...) On the right hand side of Figure 1, Algorithm 2 is executed for 600 second, for τ n = 20 1.1n. We compare TS with the Randomized Estimate algorithm [2] for 100 different repetitions.