Planning in entropy-regularized Markov decision processes and games

Authors: Jean-Bastien Grill, Omar Darwiche Domingues, Pierre Menard, Remi Munos, Michal Valko

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We propose Smooth Cruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the environment. ... Our main contribution is an algorithm that estimates the value function in a given state in planning problems that satisfy specific smoothness conditions... We exploit this smoothness property to obtain a polynomial sample complexity of order e O 1/ε4 that is problem independent.
Researcher Affiliation Collaboration Jean-Bastien Grill Deep Mind Paris jbgrill@google.com, Omar D. Domingues Seque L team, Inria Lille omar.darwiche-domingues@inria.fr, Pierre Ménard Seque L team, Inria Lille pierre.menard@inria.fr, Rémi Munos Deep Mind Paris munos@google.com, Michal Valko Deep Mind Paris valkom@deepmind.com
Pseudocode Yes Algorithm 1 Smooth Cruiser, Algorithm 2 sample V, Algorithm 3 estimate Q, Algorithm 4 generic MCTS, Algorithm 5 search
Open Source Code No The paper does not provide any links to open-source code or explicitly state that the code for the described methodology is publicly available.
Open Datasets No This paper presents theoretical work on planning algorithms and does not involve empirical training on a specific dataset, thus no public dataset access information is provided.
Dataset Splits No The paper focuses on theoretical analysis and algorithm design without conducting empirical experiments that require train/validation/test dataset splits.
Hardware Specification No The paper is a theoretical study and does not describe any specific hardware used for experiments.
Software Dependencies No The paper is a theoretical contribution focusing on algorithm design and analysis, and thus does not list specific software dependencies with version numbers for experimental reproducibility.
Experiment Setup No The paper is a theoretical work on planning algorithms and does not include details on experimental setup such as hyperparameters or training configurations.