Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Online Reinforcement Learning in Stochastic Games
Authors: Chen-Yu Wei, Yi-Te Hong, Chi-Jen Lu
NeurIPS 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We study online reinforcement learning in average-reward stochastic games (SGs). ... We propose the UCSG algorithm that achieves a sublinear regret compared to the game value when competing with an arbitrary opponent. This result improves previous ones under the same setting. The regret bound has a dependency on the diameter... If we let the opponent play an optimistic best response to the learner, UCSG ļ¬nds an ε-maximin stationary policy with a sample complexity of O (poly(1/ε)), where ε is the gap to the best policy. |
| Researcher Affiliation | Academia | Chen-Yu Wei Institute of Information Science Academia Sinica, Taiwan EMAIL Yi-Te Hong Institute of Information Science Academia Sinica, Taiwan EMAIL Chi-Jen Lu Institute of Information Science Academia Sinica, Taiwan EMAIL |
| Pseudocode | Yes | Algorithm 1 UCSG |
| Open Source Code | No | The paper does not mention providing open-source code for the described methodology. |
| Open Datasets | No | The paper is theoretical and does not use or describe any datasets for training or evaluation. |
| Dataset Splits | No | The paper is theoretical and does not describe any dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not describe any experiments that would require hardware specifications. |
| Software Dependencies | No | The paper is theoretical and does not mention specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe an experimental setup, including hyperparameters or system-level training settings. |