Multiple-Step Greedy Policies in Approximate and Online Reinforcement Learning
Authors: Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this work, we study multiple-step greedy algorithms in more practical setups. We begin by highlighting a counter-intuitive difficulty... we formulate and analyze online and approximate algorithms that use such a multi-step greedy operator. and a next indisputable step would be to empirically evaluate implementations of the algorithms presented here. |
| Researcher Affiliation | Academia | Yonathan Efroni jonathan.efroni@gmail.com Gal Dalal gald@campus.technion.ac.il Bruno Scherrer bruno.scherrer@inria.fr Shie Mannor shie@ee.technion.ac.il Department of Electrical Engineering, Technion, Israel Institute of Technology INRIA, Villers les Nancy, France |
| Pseudocode | Yes | Algorithm 1 Two-Timescale Online κ-Policy-Iteration, Algorithm 2 κ-API, Algorithm 3 κ-PSDP |
| Open Source Code | No | The paper does not contain any statement about making its source code available. The discussion section states: "Lastly, a next indisputable step would be to empirically evaluate implementations of the algorithms presented here." |
| Open Datasets | No | The paper is theoretical and does not use datasets. It defines an MDP framework but does not mention specific training data. |
| Dataset Splits | No | The paper is theoretical and does not conduct experiments with datasets, thus no dataset splits are discussed. |
| Hardware Specification | No | The paper is theoretical and does not report on experiments, thus no hardware specifications are provided. |
| Software Dependencies | No | The paper is theoretical and does not report on experiments, thus no software dependencies with version numbers are provided. |
| Experiment Setup | No | The paper is theoretical and does not report on experiments, thus no experimental setup details like hyperparameters or training settings are provided. |