Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Achieving Near-Optimal Individual Regret & Low Communications in Multi-Agent Bandits
Authors: Xuchuang Wang, Lin Yang, Yu-Zhen Janice Chen, Xutong Liu, Mohammad Hajiesmaili, Don Towsley, John C.S. Lui
ICLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also conduct simulations to illustrate the advantage of our algorithm by comparing it to other known baselines. 5 NUMERICAL SIMULATIONS |
| Researcher Affiliation | Academia | Xuchuang Wang Department of Computer Science and Engineering The Chinese University of Hong Kong EMAIL Lin Yang School of Intelligence Science and Technology Nanjing University EMAIL Yu-zhen Janice Chen College of Information and Computer Sciences University of Massachusetts Amherst EMAIL Xutong Liu Department of Computer Science and Engineering The Chinese University of Hong Kong EMAIL Mohammad Hajiesmaili & Don Towsley College of Information and Computer Sciences University of Massachusetts Amherst EMAIL John C.S. Lui Department of Computer Science and Engineering The Chinese University of Hong Kong EMAIL |
| Pseudocode | Yes | Algorithm 1 The UCB-TCOM Algorithm (for each agent) |
| Open Source Code | No | The paper does not provide any links to open-source code for the described methodology or explicitly state that the code is publicly available. |
| Open Datasets | Yes | Each arm is associated with a Bernoulli reward random variable whose mean is uniformly randomly taken from Ad-Clicks (Avito, 2015). |
| Dataset Splits | No | The paper describes a simulation setup for a bandit problem, where rewards are drawn dynamically. It does not define explicit training, validation, or test dataset splits in the traditional sense for a static dataset. |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers. |
| Experiment Setup | Yes | Experiment setup. Unless otherwise stated, the experiments consist of M = 25 agents and K = 20 arms, communication set parameter α = 1.2, buffering ratio β = 2, and T = 30, 000. Each arm is associated with a Bernoulli reward random variable whose mean is uniformly randomly taken from Ad-Clicks (Avito, 2015). All results are averaged over 50 trials and their standard deviations are plotted as shaded regions. |