Gradient-free Online Learning in Continuous Games with Delayed Rewards

Authors: Amélie Héliou, Panayotis Mertikopoulos, Zhengyuan Zhou

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this general context, we derive new bounds for the agents regret; furthermore, under a standard diagonal concavity assumption, we show that the induced sequence of play converges to Nash equilibrium (NE) with probability 1, even if the delay between choosing an action and receiving the corresponding reward is unbounded.
Researcher Affiliation Collaboration 1Criteo AI Lab 2Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LIG, 38000 Grenoble, France 3Stern School of Business, NYU, and IBM Research. Correspondence to: Panayotis Mertikopoulos <panayotis.mertikopoulos@imag.fr>.
Pseudocode Yes Algorithm 1: gradient-free online learning with delayed feedback (GOLD) [focal player view]
Open Source Code No The paper does not provide any concrete access to source code for the methodology described.
Open Datasets No The paper is theoretical and does not describe experiments using datasets, thus no information about public dataset availability is provided.
Dataset Splits No The paper is theoretical and does not describe experiments, thus no information about training/test/validation splits is provided.
Hardware Specification No The paper is theoretical and does not describe experiments, thus no hardware specifications are mentioned.
Software Dependencies No The paper is theoretical and does not describe experiments that would require software dependencies with version numbers.
Experiment Setup No The paper is theoretical and does not describe an experimental setup with hyperparameters or training settings.