Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Delay and Cooperation in Nonstochastic Bandits
Authors: Nicolò Cesa-Bianchi, Claudio Gentile, Yishay Mansour
JMLR 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We introduce Exp3-Coop, a cooperative version of the Exp3 algorithm and prove that with K actions and N agents the average per-agent regret after T rounds is at most of order q d + 1 + K N α d (T ln K), where α d is the independence number of the d-th power of the communication graph G. We then show that for any connected graph, for d = K the regret bound is K1/4 T, strictly better than the minimax regret KT for noncooperating agents. |
| Researcher Affiliation | Collaboration | Nicol o Cesa-Bianchi EMAIL Department of Computer Science & DSRC Universit a degli Studi di Milano 20133 Milano, Italy Claudio Gentile EMAIL Google Research New York, NY, USA Yishay Mansour EMAIL Google Research and Tel-Aviv University Tel-Aviv 6997801, Israel |
| Pseudocode | Yes | Our learning protocol is summarized in Figure 1, while Figure 2 contains a pictorial example. Our first algorithm, called Exp3-Coop (Cooperative Exp3) is described in Figure 3. The Exp3-Coop2 Algorithm Parameters: Undirected graph G = (V, E); learning rate η; exploration parameter δ > 0. The Exp3-Coop-Mix Algorithm Parameters: Undirected communication graph G = (V, E); maximal delay d; delay distribution D over {0, 1, . . . , d 1}; learning rate η > 0. |
| Open Source Code | No | The paper does not provide explicit statements or links indicating that source code for the described methodologies is openly available. |
| Open Datasets | No | The paper is theoretical and does not describe or use specific datasets for empirical evaluation. It refers to abstract 'action sets' and 'loss vectors' in its mathematical framework. |
| Dataset Splits | No | The paper is theoretical and does not perform experiments with datasets, therefore, there is no mention of dataset splits. |
| Hardware Specification | No | The paper focuses on theoretical analysis and algorithm design without performing empirical experiments, so no hardware specifications are provided. |
| Software Dependencies | No | The paper is theoretical and does not implement or run its algorithms, so no software dependencies with version numbers are mentioned. |
| Experiment Setup | No | The paper is theoretical and focuses on algorithm design and regret analysis, not empirical experimentation. It discusses algorithmic parameters (e.g., 'delay d', 'learning rate η', 'exploration parameter δ') in a theoretical context, but does not provide specific values for an experimental setup. |