reproducibilityindex.ai

Cooperative Stochastic Bandits with Asynchronous Agents and Constrained Feedback

Authors: Lin Yang, Yu-Zhen Janice Chen, Stephen Pasteris, Mohammad Hajiesmaili, John C. S. Lui, Don Towsley

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 Numerical Experiments Our goal in this section is to numerically investigate the performance of AAE-LCB and compare it to that of AAE-AAE and CO-UCB (see Section 3.3), and show that AAE-LCB effectively resolves the challenge slow agents present, while neither AAE-AAE nor CO-UCB do so. More speciﬁcally, by comparing AAE-LCB with CO-UCB, our goal is to verify the importance of two-stage learning in the design of AAE-LCB, and by comparing AAE-LCB and AAE-AAE, our goal is to verify the importance of using LCB as the indexing policy for external arm selection.
Researcher Affiliation	Academia	Lin Yang, Yu-Zhen Janice Chen University of Massachusetts Amherst {linyang,yuzhenchen}@cs.umass.edu Stephen Pasteris University College London stephen.pasteris@gmail.com Mohammad H. Hajiesmaili University of Massachusetts Amherst hajiesmaili@cs.umass.edu John C. S. Lui Chinese University of Hong Kong cslui@cse.cuhk.edu.hk Don Towsley University of Massachusetts Amherst towsley@cs.umass.edu
Pseudocode	Yes	Algorithm 1 AAE-LCB: A Cooperative Bandit Algorithm for Agent j in the FC-CMA2B setting
Open Source Code	No	The paper does not contain any statement about releasing the source code for the described methodology or a link to a repository.
Open Datasets	Yes	Experimental setup. We assume there are K = 100 arms with Bernoulli rewards with average rewards uniformly randomly taken from Ad-Clicks [1]. [1] Kaggle avito context ad clicks 2015. https://www.kaggle.com/c/ avito-context-ad-clicks.
Dataset Splits	No	The paper mentions using a dataset and running experiments but does not specify any training, validation, or test splits. It only describes the general setup of agents and arms.
Hardware Specification	No	The paper describes the experimental setup and results but does not specify any hardware details like CPU, GPU models, or memory.
Software Dependencies	No	The paper does not mention any specific software dependencies with version numbers used for the experiments.
Experiment Setup	Yes	Experimental setup. We assume there are K = 100 arms with Bernoulli rewards with average rewards uniformly randomly taken from Ad-Clicks [1]. In our experiments, we report the cumulative regret after 30,000 rounds, which corresponds to the number of decision rounds of the fastest agent. All reported values are averaged over 20 independent trials. We have 20 agents, each with 12 arms selected from among a set of K = 100 arms into two categories of 10 fast agents each with action rate of 1, and 10 slow agents with varying action rates of less than 1.