reproducibilityindex.ai

Planning in entropy-regularized Markov decision processes and games

Authors: Jean-Bastien Grill, Omar Darwiche Domingues, Pierre Menard, Remi Munos, Michal Valko

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We propose Smooth Cruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the environment. ... Our main contribution is an algorithm that estimates the value function in a given state in planning problems that satisfy speciﬁc smoothness conditions... We exploit this smoothness property to obtain a polynomial sample complexity of order e O 1/ε4 that is problem independent.
Researcher Affiliation	Collaboration	Jean-Bastien Grill Deep Mind Paris jbgrill@google.com, Omar D. Domingues Seque L team, Inria Lille omar.darwiche-domingues@inria.fr, Pierre Ménard Seque L team, Inria Lille pierre.menard@inria.fr, Rémi Munos Deep Mind Paris munos@google.com, Michal Valko Deep Mind Paris valkom@deepmind.com
Pseudocode	Yes	Algorithm 1 Smooth Cruiser, Algorithm 2 sample V, Algorithm 3 estimate Q, Algorithm 4 generic MCTS, Algorithm 5 search
Open Source Code	No	The paper does not provide any links to open-source code or explicitly state that the code for the described methodology is publicly available.
Open Datasets	No	This paper presents theoretical work on planning algorithms and does not involve empirical training on a specific dataset, thus no public dataset access information is provided.
Dataset Splits	No	The paper focuses on theoretical analysis and algorithm design without conducting empirical experiments that require train/validation/test dataset splits.
Hardware Specification	No	The paper is a theoretical study and does not describe any specific hardware used for experiments.
Software Dependencies	No	The paper is a theoretical contribution focusing on algorithm design and analysis, and thus does not list specific software dependencies with version numbers for experimental reproducibility.
Experiment Setup	No	The paper is a theoretical work on planning algorithms and does not include details on experimental setup such as hyperparameters or training configurations.