Markov Decision Processes with Time-Varying Geometric Discounting

Authors: Jiarui Gan, Annika Hennes, Rupak Majumdar, Debmalya Mandal, Goran Radanovic

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical This paper studies a model of infinite-horizon MDPs with time-varying discount factors. We take a game-theoretic perspective whereby each time step is treated as an independent decision maker with their own (fixed) discount factor and we study the subgame perfect equilibrium (SPE) of the resulting game as well as the related algorithmic problems. We present a constructive proof of the existence of an SPE and demonstrate the EXPTIME-hardness of computing an SPE. We also turn to the approximate notion of ϵ-SPE and show that an ϵ-SPE exists under milder assumptions. An algorithm is presented to compute an ϵ-SPE, of which an upper bound of the time complexity, as a function of the convergence property of the time-varying discount factor, is provided.
Researcher Affiliation Academia Jiarui Gan1, Annika Hennes2, Rupak Majumdar3, Debmalya Mandal3, Goran Radanovic3 1 University of Oxford 2 Heinrich-Heine-University D usseldorf 3 Max Planck Institute for Software Systems
Pseudocode Yes Algorithm 1: Constructing an SPE π = (πt) t=0, given that πt = π for all t T; Algorithm 2: Computing an ϵ-SPE
Open Source Code No The paper does not provide concrete access to source code, such as a repository link or an explicit statement about code release for the described methodology.
Open Datasets No This paper is theoretical and does not involve the use of datasets for training.
Dataset Splits No This paper is theoretical and does not discuss validation datasets or splits.
Hardware Specification No This paper is theoretical and does not describe any specific hardware used for running experiments.
Software Dependencies No This paper is theoretical and does not list any specific software dependencies with version numbers needed to replicate experimental results.
Experiment Setup No This paper is theoretical and does not describe an experimental setup with specific hyperparameters or system-level training settings.