Efficiently Solving MDPs with Stochastic Mirror Descent
Authors: Yujia Jin, Aaron Sidford
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We present a unified framework based on primaldual stochastic mirror descent for approximately solving infinite-horizon Markov decision processes (MDPs) given a generative model. When applied to an average-reward MDP with Atot total actions and mixing time bound tmix our method computes an ϵ-optimal policy with an expected e O(t2 mix Atotϵ 2) samples from the statetransition matrix, removing the ergodicity dependence of prior art. When applied to a γ-discounted MDP with Atot total actions our method computes an ϵ-optimal policy with an expected e O((1 γ) 4Atotϵ 2) samples, improving over previous primal-dual methods and matching the state-ofthe-art up to a (1 γ) 1 factor. Both methods are model-free, update state values and policies simultaneously, and run in time linear in the number of samples taken. We achieve these results through a more general stochastic mirror descent framework for solving bilinear saddle-point problems with simplex and box domains and we demonstrate the flexibility of this framework by providing further applications to constrained MDPs. (Abstract) and Table 1 gives a comparison of sample complexity between our methods and prior methods4 for computing an ϵ-approximate policy in DMDPs and AMDPs given a generative model. (Page 2) |
| Researcher Affiliation | Academia | Yujia Jin 1 Aaron Sidford 1 1Management Science and Engineering (MS&E), Stanford University, Stanford, United States. Correspondence to: Yujia Jin <yujiajin@stanford.edu>. (Page 1) |
| Pseudocode | Yes | Algorithm 1 SMD for mixing AMDP / DMDPs (Page 3) and Algorithm 2 SMD for ℓ -ℓ1 game (Page 4) |
| Open Source Code | No | The paper does not provide any explicit statements or links indicating the availability of open-source code for the described methodology. |
| Open Datasets | No | The paper discusses theoretical sample complexity bounds for MDPs given a generative model, but does not refer to specific real-world datasets for training or provide access information for any dataset. |
| Dataset Splits | No | The paper does not describe any empirical experiments on datasets, thus no training, validation, or test splits are mentioned. |
| Hardware Specification | No | The paper is theoretical and focuses on sample complexity analysis; it does not mention any specific hardware used for experiments or computations. |
| Software Dependencies | No | The paper is theoretical and does not provide details on specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe any empirical experiments; therefore, it does not provide details on experimental setup or hyperparameter values. |