reproducibilityindex.ai

MOTS: Minimax Optimal Thompson Sampling

Authors: Tianyuan Jin, Pan Xu, Jieming Shi, Xiaokui Xiao, Quanquan Gu

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments demonstrate the superiority of MOTS over the state-of-the-art bandit algorithms such as UCB (Auer et al., 2002a), MOSS (Audibert & Bubeck, 2009), and TS (Thompson, 1933) with Gaussian priors. In this section, we experimentally compare our proposed algorithms MOTS and MOTS-J with existing algorithms for multi-armed bandit problems with Gaussian rewards.
Researcher Affiliation	Academia	1School of Computing, National University of Singapore, Singapore 2Department of Computer Science, University of California, Los Angeles, USA 3Department of Computing, The Hong Kong Polytechnic University, Hong Kong.
Pseudocode	Yes	Algorithm 1 shows the pseudo-code of MOTS, with Di(t) formulated as follows. Algorithm 2 shows the pseudo-code of MOTS-J.
Open Source Code	No	The paper does not provide a specific statement or link indicating the release of open-source code for the methodology described.
Open Datasets	No	The experiments use synthetically generated Gaussian distributions based on specified parameters (e.g., K=50, µ1=1, µ2=...=µ50=1-ϵ) rather than a pre-existing public dataset with explicit access information.
Dataset Splits	No	The paper describes sequential decision process experiments where rewards are generated on-the-fly, and therefore, it does not specify explicit train/validation/test dataset splits typical for static supervised learning problems.
Hardware Specification	No	The paper does not provide any specific details regarding the hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies	No	The paper mentions implementing algorithms and using specific mathematical distributions, but it does not list any specific software dependencies or library versions (e.g., Python 3.x, PyTorch 1.x) that would be needed to replicate the experiment.
Experiment Setup	Yes	In all the experiments, the total number of time steps T is set to 10^7. The parameter ρ for MOTS deﬁned in Section 3.2 is set to 0.9999. Since we focus on Gaussian rewards, we set α = 2 in (2) for both MOTS and MOTS-J. Speciﬁcally, we consider three settings with different number of arms K and different mean rewards {µi}K i=1: (1) K = 50, µ1 = 1, and µ2 = . . . = µ50 = 1 ϵ, where ϵ varies in the range {0.2, 0.1, 0.05} for different experiments;