MOTS: Minimax Optimal Thompson Sampling
Authors: Tianyuan Jin, Pan Xu, Jieming Shi, Xiaokui Xiao, Quanquan Gu
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate the superiority of MOTS over the state-of-the-art bandit algorithms such as UCB (Auer et al., 2002a), MOSS (Audibert & Bubeck, 2009), and TS (Thompson, 1933) with Gaussian priors. In this section, we experimentally compare our proposed algorithms MOTS and MOTS-J with existing algorithms for multi-armed bandit problems with Gaussian rewards. |
| Researcher Affiliation | Academia | 1School of Computing, National University of Singapore, Singapore 2Department of Computer Science, University of California, Los Angeles, USA 3Department of Computing, The Hong Kong Polytechnic University, Hong Kong. |
| Pseudocode | Yes | Algorithm 1 shows the pseudo-code of MOTS, with Di(t) formulated as follows. Algorithm 2 shows the pseudo-code of MOTS-J. |
| Open Source Code | No | The paper does not provide a specific statement or link indicating the release of open-source code for the methodology described. |
| Open Datasets | No | The experiments use synthetically generated Gaussian distributions based on specified parameters (e.g., K=50, µ1=1, µ2=...=µ50=1-ϵ) rather than a pre-existing public dataset with explicit access information. |
| Dataset Splits | No | The paper describes sequential decision process experiments where rewards are generated on-the-fly, and therefore, it does not specify explicit train/validation/test dataset splits typical for static supervised learning problems. |
| Hardware Specification | No | The paper does not provide any specific details regarding the hardware (e.g., GPU models, CPU types, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions implementing algorithms and using specific mathematical distributions, but it does not list any specific software dependencies or library versions (e.g., Python 3.x, PyTorch 1.x) that would be needed to replicate the experiment. |
| Experiment Setup | Yes | In all the experiments, the total number of time steps T is set to 10^7. The parameter ρ for MOTS defined in Section 3.2 is set to 0.9999. Since we focus on Gaussian rewards, we set α = 2 in (2) for both MOTS and MOTS-J. Specifically, we consider three settings with different number of arms K and different mean rewards {µi}K i=1: (1) K = 50, µ1 = 1, and µ2 = . . . = µ50 = 1 ϵ, where ϵ varies in the range {0.2, 0.1, 0.05} for different experiments; |