Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Delay and Cooperation in Nonstochastic Bandits

Authors: Nicolò Cesa-Bianchi, Claudio Gentile, Yishay Mansour

JMLR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We introduce Exp3-Coop, a cooperative version of the Exp3 algorithm and prove that with K actions and N agents the average per-agent regret after T rounds is at most of order q d + 1 + K N α d (T ln K), where α d is the independence number of the d-th power of the communication graph G. We then show that for any connected graph, for d = K the regret bound is K1/4 T, strictly better than the minimax regret KT for noncooperating agents.
Researcher Affiliation Collaboration Nicol o Cesa-Bianchi EMAIL Department of Computer Science & DSRC Universit a degli Studi di Milano 20133 Milano, Italy Claudio Gentile EMAIL Google Research New York, NY, USA Yishay Mansour EMAIL Google Research and Tel-Aviv University Tel-Aviv 6997801, Israel
Pseudocode Yes Our learning protocol is summarized in Figure 1, while Figure 2 contains a pictorial example. Our first algorithm, called Exp3-Coop (Cooperative Exp3) is described in Figure 3. The Exp3-Coop2 Algorithm Parameters: Undirected graph G = (V, E); learning rate η; exploration parameter δ > 0. The Exp3-Coop-Mix Algorithm Parameters: Undirected communication graph G = (V, E); maximal delay d; delay distribution D over {0, 1, . . . , d 1}; learning rate η > 0.
Open Source Code No The paper does not provide explicit statements or links indicating that source code for the described methodologies is openly available.
Open Datasets No The paper is theoretical and does not describe or use specific datasets for empirical evaluation. It refers to abstract 'action sets' and 'loss vectors' in its mathematical framework.
Dataset Splits No The paper is theoretical and does not perform experiments with datasets, therefore, there is no mention of dataset splits.
Hardware Specification No The paper focuses on theoretical analysis and algorithm design without performing empirical experiments, so no hardware specifications are provided.
Software Dependencies No The paper is theoretical and does not implement or run its algorithms, so no software dependencies with version numbers are mentioned.
Experiment Setup No The paper is theoretical and focuses on algorithm design and regret analysis, not empirical experimentation. It discusses algorithmic parameters (e.g., 'delay d', 'learning rate η', 'exploration parameter δ') in a theoretical context, but does not provide specific values for an experimental setup.