reproducibilityindex.ai

Achieving Near-Optimal Individual Regret & Low Communications in Multi-Agent Bandits

Authors: Xuchuang Wang, Lin Yang, Yu-Zhen Janice Chen, Xutong Liu, Mohammad Hajiesmaili, Don Towsley, John C.S. Lui

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We also conduct simulations to illustrate the advantage of our algorithm by comparing it to other known baselines. 5 NUMERICAL SIMULATIONS
Researcher Affiliation	Academia	Xuchuang Wang Department of Computer Science and Engineering The Chinese University of Hong Kong xuchuangw@gmail.com Lin Yang School of Intelligence Science and Technology Nanjing University linyang@nju.edu.cn Yu-zhen Janice Chen College of Information and Computer Sciences University of Massachusetts Amherst yuzhenchen@cs.umass.edu Xutong Liu Department of Computer Science and Engineering The Chinese University of Hong Kong liuxt@cse.cuhk.edu.hk Mohammad Hajiesmaili & Don Towsley College of Information and Computer Sciences University of Massachusetts Amherst {hajiesmaili, towsley}@cs.umass.edu John C.S. Lui Department of Computer Science and Engineering The Chinese University of Hong Kong cslui@cse.cuhk.edu.hk
Pseudocode	Yes	Algorithm 1 The UCB-TCOM Algorithm (for each agent)
Open Source Code	No	The paper does not provide any links to open-source code for the described methodology or explicitly state that the code is publicly available.
Open Datasets	Yes	Each arm is associated with a Bernoulli reward random variable whose mean is uniformly randomly taken from Ad-Clicks (Avito, 2015).
Dataset Splits	No	The paper describes a simulation setup for a bandit problem, where rewards are drawn dynamically. It does not define explicit training, validation, or test dataset splits in the traditional sense for a static dataset.
Hardware Specification	No	The paper does not specify any hardware details (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers.
Experiment Setup	Yes	Experiment setup. Unless otherwise stated, the experiments consist of M = 25 agents and K = 20 arms, communication set parameter α = 1.2, buffering ratio β = 2, and T = 30, 000. Each arm is associated with a Bernoulli reward random variable whose mean is uniformly randomly taken from Ad-Clicks (Avito, 2015). All results are averaged over 50 trials and their standard deviations are plotted as shaded regions.