A Theory of Regularized Markov Decision Processes
Authors: Matthieu Geist, Bruno Scherrer, Olivier Pietquin
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We propose a general theory of regularized Markov Decision Processes that generalizes these approaches in two directions: we consider a larger class of regularizers, and we consider the general modiļ¬ed policy iteration approach, encompassing both policy iteration and value iteration. The core building blocks of this theory are a notion of regularized Bellman operator and the Legendre-Fenchel transform, a classical tool of convex optimization. This approach allows for error propagation analyses of general algorithmic schemes of which (possibly variants of) classical algorithms such as Trust Region Policy Optimization, Soft Q-learning, Stochastic Actor Critic or Dynamic Policy Programming are special cases. All proofs are provided in the appendix. |
| Researcher Affiliation | Collaboration | 1Google Research, Brain Team. 2Universit e de Lorraine, CNRS, Inria, IECL, F-54000 Nancy, France. |
| Pseudocode | No | The paper presents mathematical formulations of algorithms, such as equation (1) for regularized modified policy iteration, but it does not include formal pseudocode blocks or an 'Algorithm' section. |
| Open Source Code | No | The paper does not provide any statements about releasing code or links to a code repository for the described methodology. |
| Open Datasets | No | The paper is theoretical and does not present experiments with datasets. It refers to 'observed transitions (si, ai, ri, s i)' in the context of describing how existing algorithms (like Soft Q-learning) work in general, but not as data used in the authors' own experiments. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical experiments requiring dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not describe any empirical experiments, therefore no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and does not describe any empirical experiments, therefore no specific software dependencies with version numbers are mentioned. |
| Experiment Setup | No | The paper is theoretical and focuses on mathematical derivations and analyses of a general framework. It does not describe an experimental setup, hyperparameters, or training settings for empirical evaluation. |