Planning in entropy-regularized Markov decision processes and games
Authors: Jean-Bastien Grill, Omar Darwiche Domingues, Pierre Menard, Remi Munos, Michal Valko
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We propose Smooth Cruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the environment. ... Our main contribution is an algorithm that estimates the value function in a given state in planning problems that satisfy specific smoothness conditions... We exploit this smoothness property to obtain a polynomial sample complexity of order e O 1/ε4 that is problem independent. |
| Researcher Affiliation | Collaboration | Jean-Bastien Grill Deep Mind Paris jbgrill@google.com, Omar D. Domingues Seque L team, Inria Lille omar.darwiche-domingues@inria.fr, Pierre Ménard Seque L team, Inria Lille pierre.menard@inria.fr, Rémi Munos Deep Mind Paris munos@google.com, Michal Valko Deep Mind Paris valkom@deepmind.com |
| Pseudocode | Yes | Algorithm 1 Smooth Cruiser, Algorithm 2 sample V, Algorithm 3 estimate Q, Algorithm 4 generic MCTS, Algorithm 5 search |
| Open Source Code | No | The paper does not provide any links to open-source code or explicitly state that the code for the described methodology is publicly available. |
| Open Datasets | No | This paper presents theoretical work on planning algorithms and does not involve empirical training on a specific dataset, thus no public dataset access information is provided. |
| Dataset Splits | No | The paper focuses on theoretical analysis and algorithm design without conducting empirical experiments that require train/validation/test dataset splits. |
| Hardware Specification | No | The paper is a theoretical study and does not describe any specific hardware used for experiments. |
| Software Dependencies | No | The paper is a theoretical contribution focusing on algorithm design and analysis, and thus does not list specific software dependencies with version numbers for experimental reproducibility. |
| Experiment Setup | No | The paper is a theoretical work on planning algorithms and does not include details on experimental setup such as hyperparameters or training configurations. |