Gradient-free Online Learning in Continuous Games with Delayed Rewards
Authors: Amélie Héliou, Panayotis Mertikopoulos, Zhengyuan Zhou
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this general context, we derive new bounds for the agents regret; furthermore, under a standard diagonal concavity assumption, we show that the induced sequence of play converges to Nash equilibrium (NE) with probability 1, even if the delay between choosing an action and receiving the corresponding reward is unbounded. |
| Researcher Affiliation | Collaboration | 1Criteo AI Lab 2Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LIG, 38000 Grenoble, France 3Stern School of Business, NYU, and IBM Research. Correspondence to: Panayotis Mertikopoulos <panayotis.mertikopoulos@imag.fr>. |
| Pseudocode | Yes | Algorithm 1: gradient-free online learning with delayed feedback (GOLD) [focal player view] |
| Open Source Code | No | The paper does not provide any concrete access to source code for the methodology described. |
| Open Datasets | No | The paper is theoretical and does not describe experiments using datasets, thus no information about public dataset availability is provided. |
| Dataset Splits | No | The paper is theoretical and does not describe experiments, thus no information about training/test/validation splits is provided. |
| Hardware Specification | No | The paper is theoretical and does not describe experiments, thus no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and does not describe experiments that would require software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe an experimental setup with hyperparameters or training settings. |