Achieving Near-Optimal Individual Regret & Low Communications in Multi-Agent Bandits
Authors: Xuchuang Wang, Lin Yang, Yu-Zhen Janice Chen, Xutong Liu, Mohammad Hajiesmaili, Don Towsley, John C.S. Lui
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also conduct simulations to illustrate the advantage of our algorithm by comparing it to other known baselines. 5 NUMERICAL SIMULATIONS |
| Researcher Affiliation | Academia | Xuchuang Wang Department of Computer Science and Engineering The Chinese University of Hong Kong xuchuangw@gmail.com Lin Yang School of Intelligence Science and Technology Nanjing University linyang@nju.edu.cn Yu-zhen Janice Chen College of Information and Computer Sciences University of Massachusetts Amherst yuzhenchen@cs.umass.edu Xutong Liu Department of Computer Science and Engineering The Chinese University of Hong Kong liuxt@cse.cuhk.edu.hk Mohammad Hajiesmaili & Don Towsley College of Information and Computer Sciences University of Massachusetts Amherst {hajiesmaili, towsley}@cs.umass.edu John C.S. Lui Department of Computer Science and Engineering The Chinese University of Hong Kong cslui@cse.cuhk.edu.hk |
| Pseudocode | Yes | Algorithm 1 The UCB-TCOM Algorithm (for each agent) |
| Open Source Code | No | The paper does not provide any links to open-source code for the described methodology or explicitly state that the code is publicly available. |
| Open Datasets | Yes | Each arm is associated with a Bernoulli reward random variable whose mean is uniformly randomly taken from Ad-Clicks (Avito, 2015). |
| Dataset Splits | No | The paper describes a simulation setup for a bandit problem, where rewards are drawn dynamically. It does not define explicit training, validation, or test dataset splits in the traditional sense for a static dataset. |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers. |
| Experiment Setup | Yes | Experiment setup. Unless otherwise stated, the experiments consist of M = 25 agents and K = 20 arms, communication set parameter α = 1.2, buffering ratio β = 2, and T = 30, 000. Each arm is associated with a Bernoulli reward random variable whose mean is uniformly randomly taken from Ad-Clicks (Avito, 2015). All results are averaged over 50 trials and their standard deviations are plotted as shaded regions. |