Towards Tight Bounds on the Sample Complexity of Average-reward MDPs
Authors: Yujia Jin, Aaron Sidford
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We prove new upper and lower bounds for sample complexity of finding an ϵ-optimal policy of an infinite-horizon average-reward Markov decision process (MDP) given access to a generative model. |
| Researcher Affiliation | Academia | Yujia Jin 1 Aaron Sidford 1 1Management Science and Engineering, Stanford University, CA, United States. Correspondence to: Yujia Jin <yujiajin@stanford.edu>. |
| Pseudocode | No | The paper describes algorithms but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper is purely theoretical and does not describe software or methodology for which open-source code would be provided. |
| Open Datasets | No | The paper is theoretical and does not conduct experiments on datasets, so there is no mention of publicly available or open datasets. |
| Dataset Splits | No | The paper is theoretical and does not conduct experiments on datasets, so there is no mention of training/validation/test splits. |
| Hardware Specification | No | The paper is purely theoretical and does not describe any experimental setup or hardware used. |
| Software Dependencies | No | The paper is purely theoretical and does not describe any experimental setup or software dependencies with versions. |
| Experiment Setup | No | The paper is purely theoretical and does not describe an experimental setup with hyperparameters or system-level training settings. |