Multiple-Step Greedy Policies in Approximate and Online Reinforcement Learning

Authors: Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this work, we study multiple-step greedy algorithms in more practical setups. We begin by highlighting a counter-intuitive difficulty... we formulate and analyze online and approximate algorithms that use such a multi-step greedy operator. and a next indisputable step would be to empirically evaluate implementations of the algorithms presented here.
Researcher Affiliation Academia Yonathan Efroni jonathan.efroni@gmail.com Gal Dalal gald@campus.technion.ac.il Bruno Scherrer bruno.scherrer@inria.fr Shie Mannor shie@ee.technion.ac.il Department of Electrical Engineering, Technion, Israel Institute of Technology INRIA, Villers les Nancy, France
Pseudocode Yes Algorithm 1 Two-Timescale Online κ-Policy-Iteration, Algorithm 2 κ-API, Algorithm 3 κ-PSDP
Open Source Code No The paper does not contain any statement about making its source code available. The discussion section states: "Lastly, a next indisputable step would be to empirically evaluate implementations of the algorithms presented here."
Open Datasets No The paper is theoretical and does not use datasets. It defines an MDP framework but does not mention specific training data.
Dataset Splits No The paper is theoretical and does not conduct experiments with datasets, thus no dataset splits are discussed.
Hardware Specification No The paper is theoretical and does not report on experiments, thus no hardware specifications are provided.
Software Dependencies No The paper is theoretical and does not report on experiments, thus no software dependencies with version numbers are provided.
Experiment Setup No The paper is theoretical and does not report on experiments, thus no experimental setup details like hyperparameters or training settings are provided.