Markov Decision Processes with Time-Varying Geometric Discounting
Authors: Jiarui Gan, Annika Hennes, Rupak Majumdar, Debmalya Mandal, Goran Radanovic
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This paper studies a model of infinite-horizon MDPs with time-varying discount factors. We take a game-theoretic perspective whereby each time step is treated as an independent decision maker with their own (fixed) discount factor and we study the subgame perfect equilibrium (SPE) of the resulting game as well as the related algorithmic problems. We present a constructive proof of the existence of an SPE and demonstrate the EXPTIME-hardness of computing an SPE. We also turn to the approximate notion of ϵ-SPE and show that an ϵ-SPE exists under milder assumptions. An algorithm is presented to compute an ϵ-SPE, of which an upper bound of the time complexity, as a function of the convergence property of the time-varying discount factor, is provided. |
| Researcher Affiliation | Academia | Jiarui Gan1, Annika Hennes2, Rupak Majumdar3, Debmalya Mandal3, Goran Radanovic3 1 University of Oxford 2 Heinrich-Heine-University D usseldorf 3 Max Planck Institute for Software Systems |
| Pseudocode | Yes | Algorithm 1: Constructing an SPE π = (πt) t=0, given that πt = π for all t T; Algorithm 2: Computing an ϵ-SPE |
| Open Source Code | No | The paper does not provide concrete access to source code, such as a repository link or an explicit statement about code release for the described methodology. |
| Open Datasets | No | This paper is theoretical and does not involve the use of datasets for training. |
| Dataset Splits | No | This paper is theoretical and does not discuss validation datasets or splits. |
| Hardware Specification | No | This paper is theoretical and does not describe any specific hardware used for running experiments. |
| Software Dependencies | No | This paper is theoretical and does not list any specific software dependencies with version numbers needed to replicate experimental results. |
| Experiment Setup | No | This paper is theoretical and does not describe an experimental setup with specific hyperparameters or system-level training settings. |