Cooperative Stochastic Bandits with Asynchronous Agents and Constrained Feedback
Authors: Lin Yang, Yu-Zhen Janice Chen, Stephen Pasteris, Mohammad Hajiesmaili, John C. S. Lui, Don Towsley
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 Numerical Experiments Our goal in this section is to numerically investigate the performance of AAE-LCB and compare it to that of AAE-AAE and CO-UCB (see Section 3.3), and show that AAE-LCB effectively resolves the challenge slow agents present, while neither AAE-AAE nor CO-UCB do so. More specifically, by comparing AAE-LCB with CO-UCB, our goal is to verify the importance of two-stage learning in the design of AAE-LCB, and by comparing AAE-LCB and AAE-AAE, our goal is to verify the importance of using LCB as the indexing policy for external arm selection. |
| Researcher Affiliation | Academia | Lin Yang, Yu-Zhen Janice Chen University of Massachusetts Amherst {linyang,yuzhenchen}@cs.umass.edu Stephen Pasteris University College London stephen.pasteris@gmail.com Mohammad H. Hajiesmaili University of Massachusetts Amherst hajiesmaili@cs.umass.edu John C. S. Lui Chinese University of Hong Kong cslui@cse.cuhk.edu.hk Don Towsley University of Massachusetts Amherst towsley@cs.umass.edu |
| Pseudocode | Yes | Algorithm 1 AAE-LCB: A Cooperative Bandit Algorithm for Agent j in the FC-CMA2B setting |
| Open Source Code | No | The paper does not contain any statement about releasing the source code for the described methodology or a link to a repository. |
| Open Datasets | Yes | Experimental setup. We assume there are K = 100 arms with Bernoulli rewards with average rewards uniformly randomly taken from Ad-Clicks [1]. [1] Kaggle avito context ad clicks 2015. https://www.kaggle.com/c/ avito-context-ad-clicks. |
| Dataset Splits | No | The paper mentions using a dataset and running experiments but does not specify any training, validation, or test splits. It only describes the general setup of agents and arms. |
| Hardware Specification | No | The paper describes the experimental setup and results but does not specify any hardware details like CPU, GPU models, or memory. |
| Software Dependencies | No | The paper does not mention any specific software dependencies with version numbers used for the experiments. |
| Experiment Setup | Yes | Experimental setup. We assume there are K = 100 arms with Bernoulli rewards with average rewards uniformly randomly taken from Ad-Clicks [1]. In our experiments, we report the cumulative regret after 30,000 rounds, which corresponds to the number of decision rounds of the fastest agent. All reported values are averaged over 20 independent trials. We have 20 agents, each with 12 arms selected from among a set of K = 100 arms into two categories of 10 fast agents each with action rate of 1, and 10 slow agents with varying action rates of less than 1. |