Thompson Sampling for Bandits with Clustered Arms

Authors: Emil Carlsson, Devdatt Dubhashi, Fredrik D. Johansson

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show, both theoretically and empirically, how exploiting a given cluster structure can significantly improve the regret and computational cost compared to using standard Thompson sampling. and Finally, we perform an empirical evaluation showing that our algorithms perform well compared to previously proposed algorithms for bandits with clustered arms.
Researcher Affiliation Academia Emil Carlsson , Devdatt Dubhashi , Fredrik D. Johansson Department of Computer Science and Engineering, Chalmers University {caremil, dubhashi, fredrik.johansson}@chalmers.se
Pseudocode Yes Algorithm 1 TSC and Algorithm 2 HTS
Open Source Code No No explicit statement or link providing concrete access to the source code for the methodology described in this paper was found.
Open Datasets No The paper uses synthetic data and data generated 'in the same way as in [Bouneffouf et al., 2019]', but does not provide concrete access information (e.g., specific links, DOIs, or direct citations to a publicly available dataset) for the data used in its experiments.
Dataset Splits No The paper describes synthetic data generation and simulation time steps, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts) as would be common for static datasets.
Hardware Specification No No specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running the experiments were provided. The acknowledgements mention 'resources provided by the Swedish National Infrastructure for Computing (SNIC)' but without specifications.
Software Dependencies No No specific ancillary software details, such as library names with version numbers (e.g., Python 3.8, PyTorch 1.9), were found. The paper mentions various algorithms and methods but not their software implementations with versions.
Experiment Setup Yes We run all algorithms with there corresponding standard parameter (v = 1 for Lin TS and Lin TSC, c = 2 for Lin UCB and Lin UCBC). and We set the reward probability of the best arm in the optimal cluster to be 0.6 and for the worst arm in the optimal cluster we set it to be 0.6 w . For the remaining A 2 arms in the optimal cluster we draw the reward probability from U(0.6 w , 0.6) for each arm.