Thompson Sampling with Less Exploration is Fast and Optimal

Authors: Tianyuan Jin, Xianglin Yang, Xiaokui Xiao, Pan Xu

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evaluations confirm the efficiency and optimality of ϵ-TS. In this section, we conduct experiments to show that the proposed algorithm ϵ-TS achieves comparable or better performance than state-of-the-art MAB algorithms.
Researcher Affiliation Academia 1National University of Singapore 2Duke University.
Pseudocode Yes Algorithm 1 shows the pseudo code for the proposed algorithm, ϵ-Exploring Thompson Sampling (denoted as ϵ-TS ).
Open Source Code No The paper states 'We implemented all methods in Python.' but does not provide a concrete link or explicit statement about the availability of its source code.
Open Datasets No To evaluate all the methods, we generate datasets under 4 reward distributions presented in Table 1 and 2 choices of K (K = 10 and 50 respectively). The mean rewards are generated as follows.
Dataset Splits No The paper does not provide specific dataset split information for validation, focusing instead on training and testing/evaluation through cumulative regret.
Hardware Specification Yes All experiments were conducted on a Linux machine equipped with 72 threads, powered by two 18-core Intel Xeon(R) Gold 6240 CPUs @ 2.60GHz and 376GB RAM.
Software Dependencies No The paper states 'We implemented all methods in Python.' and 'The KL equations were solved using the scipy.optimize.newton function.' but does not provide specific version numbers for Python or any other software dependencies.
Experiment Setup Yes For Gaussian rewards, the variance is set to be 1, and for Gamma rewards, the shape parameter is chosen as α = 1. We set ϵ = 1/K for ϵ-TS throughout our experiments. For all algorithms, the experimental results are averaged over 1000 repetitions.