Achieving Near-Optimal Individual Regret & Low Communications in Multi-Agent Bandits

Authors: Xuchuang Wang, Lin Yang, Yu-Zhen Janice Chen, Xutong Liu, Mohammad Hajiesmaili, Don Towsley, John C.S. Lui

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also conduct simulations to illustrate the advantage of our algorithm by comparing it to other known baselines. 5 NUMERICAL SIMULATIONS
Researcher Affiliation Academia Xuchuang Wang Department of Computer Science and Engineering The Chinese University of Hong Kong xuchuangw@gmail.com Lin Yang School of Intelligence Science and Technology Nanjing University linyang@nju.edu.cn Yu-zhen Janice Chen College of Information and Computer Sciences University of Massachusetts Amherst yuzhenchen@cs.umass.edu Xutong Liu Department of Computer Science and Engineering The Chinese University of Hong Kong liuxt@cse.cuhk.edu.hk Mohammad Hajiesmaili & Don Towsley College of Information and Computer Sciences University of Massachusetts Amherst {hajiesmaili, towsley}@cs.umass.edu John C.S. Lui Department of Computer Science and Engineering The Chinese University of Hong Kong cslui@cse.cuhk.edu.hk
Pseudocode Yes Algorithm 1 The UCB-TCOM Algorithm (for each agent)
Open Source Code No The paper does not provide any links to open-source code for the described methodology or explicitly state that the code is publicly available.
Open Datasets Yes Each arm is associated with a Bernoulli reward random variable whose mean is uniformly randomly taken from Ad-Clicks (Avito, 2015).
Dataset Splits No The paper describes a simulation setup for a bandit problem, where rewards are drawn dynamically. It does not define explicit training, validation, or test dataset splits in the traditional sense for a static dataset.
Hardware Specification No The paper does not specify any hardware details (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers.
Experiment Setup Yes Experiment setup. Unless otherwise stated, the experiments consist of M = 25 agents and K = 20 arms, communication set parameter α = 1.2, buffering ratio β = 2, and T = 30, 000. Each arm is associated with a Bernoulli reward random variable whose mean is uniformly randomly taken from Ad-Clicks (Avito, 2015). All results are averaged over 50 trials and their standard deviations are plotted as shaded regions.