Competing for Shareable Arms in Multi-Player Multi-Armed Bandits

Authors: Renzhe Xu, Haotian Wang, Xingxuan Zhang, Bo Li, Peng Cui

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We finally validate the effectiveness of the method in extensive synthetic experiments.
Researcher Affiliation Academia 1Department of Computer Science and Technology, Tsinghua University, Beijing, China 2School of Economics and Management, Tsinghua University, Beijing, China.
Pseudocode Yes Algorithm 1 Selfish MPMAB with Averaging Allocation (SMAA) ... Algorithm 2 SMAA (without knowledge of N and rank) ... Algorithm 3 Musical Chairs (Rosenski et al., 2016)
Open Source Code Yes The source code is available at https://github.com/windxrz/SMAA.
Open Datasets No The paper describes a 'Data-generating process' where 'the beta distribution and Bernoulli distribution' are used to 'randomly sample the two shape parameters α and β uniformly in [0, 5]' or 'randomly sample the probability parameter p uniformly in [0, 1]'. This indicates synthetic data generation, not the use of a publicly available dataset with concrete access information.
Dataset Splits No The paper describes synthetic experiments but does not provide specific training, validation, or test dataset splits. It mentions 'Data-generating process' for creating the data but not how it was partitioned for training and evaluation.
Hardware Specification No The paper states 'We validate the effectiveness of the proposed SMAA method through synthetic experiments' but does not provide any specific details about the hardware (e.g., GPU models, CPU types) used for these experiments.
Software Dependencies No The paper does not explicitly list specific software dependencies with their version numbers (e.g., Python, PyTorch, specific libraries used for implementation).
Experiment Setup Yes SMAA (Ours). We use a hyper-parameter β to control the strength between exploration and exploitation. Specifically, the KL index in Equation (10) is modified as ˆbj,k(t) = sup {q ˆµj,k(t) : τj,k(t) kl(ˆµj,k(t), q) β f(t)} . We search the hyper-parameter β {0.01, 0.02, 0.05, 0.1, 0.2, 0.5}. ... SMAA (Ours) when N and rank are unknown. Besides the hyper-parameter β in the exploration-exploitation phase, we introduce another hyper-parameter for this method in the Musical Chairs phase. Specifically, we set the parameter T0 in Algorithm 3 as η 50K2 log(4T) and η is searched in {0.01, 0.02, 0.05, 0.1, 0.2, 0.5}.