reproducibilityindex.ai

Efficiently Solving MDPs with Stochastic Mirror Descent

Authors: Yujia Jin, Aaron Sidford

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We present a uniﬁed framework based on primaldual stochastic mirror descent for approximately solving inﬁnite-horizon Markov decision processes (MDPs) given a generative model. When applied to an average-reward MDP with Atot total actions and mixing time bound tmix our method computes an ϵ-optimal policy with an expected e O(t2 mix Atotϵ 2) samples from the statetransition matrix, removing the ergodicity dependence of prior art. When applied to a γ-discounted MDP with Atot total actions our method computes an ϵ-optimal policy with an expected e O((1 γ) 4Atotϵ 2) samples, improving over previous primal-dual methods and matching the state-ofthe-art up to a (1 γ) 1 factor. Both methods are model-free, update state values and policies simultaneously, and run in time linear in the number of samples taken. We achieve these results through a more general stochastic mirror descent framework for solving bilinear saddle-point problems with simplex and box domains and we demonstrate the ﬂexibility of this framework by providing further applications to constrained MDPs. (Abstract) and Table 1 gives a comparison of sample complexity between our methods and prior methods4 for computing an ϵ-approximate policy in DMDPs and AMDPs given a generative model. (Page 2)
Researcher Affiliation	Academia	Yujia Jin 1 Aaron Sidford 1 1Management Science and Engineering (MS&E), Stanford University, Stanford, United States. Correspondence to: Yujia Jin <yujiajin@stanford.edu>. (Page 1)
Pseudocode	Yes	Algorithm 1 SMD for mixing AMDP / DMDPs (Page 3) and Algorithm 2 SMD for ℓ -ℓ1 game (Page 4)
Open Source Code	No	The paper does not provide any explicit statements or links indicating the availability of open-source code for the described methodology.
Open Datasets	No	The paper discusses theoretical sample complexity bounds for MDPs given a generative model, but does not refer to specific real-world datasets for training or provide access information for any dataset.
Dataset Splits	No	The paper does not describe any empirical experiments on datasets, thus no training, validation, or test splits are mentioned.
Hardware Specification	No	The paper is theoretical and focuses on sample complexity analysis; it does not mention any specific hardware used for experiments or computations.
Software Dependencies	No	The paper is theoretical and does not provide details on specific software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and does not describe any empirical experiments; therefore, it does not provide details on experimental setup or hyperparameter values.