reproducibilityindex.ai

A Theory of Regularized Markov Decision Processes

Authors: Matthieu Geist, Bruno Scherrer, Olivier Pietquin

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We propose a general theory of regularized Markov Decision Processes that generalizes these approaches in two directions: we consider a larger class of regularizers, and we consider the general modiﬁed policy iteration approach, encompassing both policy iteration and value iteration. The core building blocks of this theory are a notion of regularized Bellman operator and the Legendre-Fenchel transform, a classical tool of convex optimization. This approach allows for error propagation analyses of general algorithmic schemes of which (possibly variants of) classical algorithms such as Trust Region Policy Optimization, Soft Q-learning, Stochastic Actor Critic or Dynamic Policy Programming are special cases. All proofs are provided in the appendix.
Researcher Affiliation	Collaboration	1Google Research, Brain Team. 2Universit e de Lorraine, CNRS, Inria, IECL, F-54000 Nancy, France.
Pseudocode	No	The paper presents mathematical formulations of algorithms, such as equation (1) for regularized modified policy iteration, but it does not include formal pseudocode blocks or an 'Algorithm' section.
Open Source Code	No	The paper does not provide any statements about releasing code or links to a code repository for the described methodology.
Open Datasets	No	The paper is theoretical and does not present experiments with datasets. It refers to 'observed transitions (si, ai, ri, s i)' in the context of describing how existing algorithms (like Soft Q-learning) work in general, but not as data used in the authors' own experiments.
Dataset Splits	No	The paper is theoretical and does not involve empirical experiments requiring dataset splits for training, validation, or testing.
Hardware Specification	No	The paper is theoretical and does not describe any empirical experiments, therefore no hardware specifications are mentioned.
Software Dependencies	No	The paper is theoretical and does not describe any empirical experiments, therefore no specific software dependencies with version numbers are mentioned.
Experiment Setup	No	The paper is theoretical and focuses on mathematical derivations and analyses of a general framework. It does not describe an experimental setup, hyperparameters, or training settings for empirical evaluation.