Online Reinforcement Learning in Stochastic Games

Authors: Chen-Yu Wei, Yi-Te Hong, Chi-Jen Lu

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We study online reinforcement learning in average-reward stochastic games (SGs). ... We propose the UCSG algorithm that achieves a sublinear regret compared to the game value when competing with an arbitrary opponent. This result improves previous ones under the same setting. The regret bound has a dependency on the diameter... If we let the opponent play an optimistic best response to the learner, UCSG finds an ε-maximin stationary policy with a sample complexity of O (poly(1/ε)), where ε is the gap to the best policy.
Researcher Affiliation Academia Chen-Yu Wei Institute of Information Science Academia Sinica, Taiwan bahh723@iis.sinica.edu.tw Yi-Te Hong Institute of Information Science Academia Sinica, Taiwan ted0504@iis.sinica.edu.tw Chi-Jen Lu Institute of Information Science Academia Sinica, Taiwan cjlu@iis.sinica.edu.tw
Pseudocode Yes Algorithm 1 UCSG
Open Source Code No The paper does not mention providing open-source code for the described methodology.
Open Datasets No The paper is theoretical and does not use or describe any datasets for training or evaluation.
Dataset Splits No The paper is theoretical and does not describe any dataset splits for training, validation, or testing.
Hardware Specification No The paper is theoretical and does not describe any experiments that would require hardware specifications.
Software Dependencies No The paper is theoretical and does not mention specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical and does not describe an experimental setup, including hyperparameters or system-level training settings.