Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Planning in entropy-regularized Markov decision processes and games
Authors: Jean-Bastien Grill, Omar Darwiche Domingues, Pierre Menard, Remi Munos, Michal Valko
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We propose Smooth Cruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the environment. ... Our main contribution is an algorithm that estimates the value function in a given state in planning problems that satisfy specific smoothness conditions... We exploit this smoothness property to obtain a polynomial sample complexity of order e O 1/ε4 that is problem independent. |
| Researcher Affiliation | Collaboration | Jean-Bastien Grill Deep Mind Paris EMAIL, Omar D. Domingues Seque L team, Inria Lille EMAIL, Pierre Ménard Seque L team, Inria Lille EMAIL, Rémi Munos Deep Mind Paris EMAIL, Michal Valko Deep Mind Paris EMAIL |
| Pseudocode | Yes | Algorithm 1 Smooth Cruiser, Algorithm 2 sample V, Algorithm 3 estimate Q, Algorithm 4 generic MCTS, Algorithm 5 search |
| Open Source Code | No | The paper does not provide any links to open-source code or explicitly state that the code for the described methodology is publicly available. |
| Open Datasets | No | This paper presents theoretical work on planning algorithms and does not involve empirical training on a specific dataset, thus no public dataset access information is provided. |
| Dataset Splits | No | The paper focuses on theoretical analysis and algorithm design without conducting empirical experiments that require train/validation/test dataset splits. |
| Hardware Specification | No | The paper is a theoretical study and does not describe any specific hardware used for experiments. |
| Software Dependencies | No | The paper is a theoretical contribution focusing on algorithm design and analysis, and thus does not list specific software dependencies with version numbers for experimental reproducibility. |
| Experiment Setup | No | The paper is a theoretical work on planning algorithms and does not include details on experimental setup such as hyperparameters or training configurations. |