reproducibilityindex.ai

Online Reinforcement Learning in Stochastic Games

Authors: Chen-Yu Wei, Yi-Te Hong, Chi-Jen Lu

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We study online reinforcement learning in average-reward stochastic games (SGs). ... We propose the UCSG algorithm that achieves a sublinear regret compared to the game value when competing with an arbitrary opponent. This result improves previous ones under the same setting. The regret bound has a dependency on the diameter... If we let the opponent play an optimistic best response to the learner, UCSG ﬁnds an ε-maximin stationary policy with a sample complexity of O (poly(1/ε)), where ε is the gap to the best policy.
Researcher Affiliation	Academia	Chen-Yu Wei Institute of Information Science Academia Sinica, Taiwan bahh723@iis.sinica.edu.tw Yi-Te Hong Institute of Information Science Academia Sinica, Taiwan ted0504@iis.sinica.edu.tw Chi-Jen Lu Institute of Information Science Academia Sinica, Taiwan cjlu@iis.sinica.edu.tw
Pseudocode	Yes	Algorithm 1 UCSG
Open Source Code	No	The paper does not mention providing open-source code for the described methodology.
Open Datasets	No	The paper is theoretical and does not use or describe any datasets for training or evaluation.
Dataset Splits	No	The paper is theoretical and does not describe any dataset splits for training, validation, or testing.
Hardware Specification	No	The paper is theoretical and does not describe any experiments that would require hardware specifications.
Software Dependencies	No	The paper is theoretical and does not mention specific software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and does not describe an experimental setup, including hyperparameters or system-level training settings.