Online Reinforcement Learning in Stochastic Games
Authors: Chen-Yu Wei, Yi-Te Hong, Chi-Jen Lu
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We study online reinforcement learning in average-reward stochastic games (SGs). ... We propose the UCSG algorithm that achieves a sublinear regret compared to the game value when competing with an arbitrary opponent. This result improves previous ones under the same setting. The regret bound has a dependency on the diameter... If we let the opponent play an optimistic best response to the learner, UCSG finds an ε-maximin stationary policy with a sample complexity of O (poly(1/ε)), where ε is the gap to the best policy. |
| Researcher Affiliation | Academia | Chen-Yu Wei Institute of Information Science Academia Sinica, Taiwan bahh723@iis.sinica.edu.tw Yi-Te Hong Institute of Information Science Academia Sinica, Taiwan ted0504@iis.sinica.edu.tw Chi-Jen Lu Institute of Information Science Academia Sinica, Taiwan cjlu@iis.sinica.edu.tw |
| Pseudocode | Yes | Algorithm 1 UCSG |
| Open Source Code | No | The paper does not mention providing open-source code for the described methodology. |
| Open Datasets | No | The paper is theoretical and does not use or describe any datasets for training or evaluation. |
| Dataset Splits | No | The paper is theoretical and does not describe any dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not describe any experiments that would require hardware specifications. |
| Software Dependencies | No | The paper is theoretical and does not mention specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe an experimental setup, including hyperparameters or system-level training settings. |