A Unifying View of Optimism in Episodic Reinforcement Learning
Authors: Gergely Neu, Ciara Pike-Burke
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper we provide a general framework for designing, analyzing and implementing such algorithms in the episodic reinforcement learning problem. This framework is built upon Lagrangian duality, and demonstrates that every model-optimistic algorithm that constructs an optimistic MDP has an equivalent representation as a value-optimistic dynamic programming algorithm. ... we show that it is possible to get the best of both worlds by providing a class of algorithms which have a computationally efficient dynamic-programming implementation and also a simple probabilistic analysis. |
| Researcher Affiliation | Academia | Gergely Neu Universitat Pompeu Fabra Barcelona, Spain gergely.neu@gmail.com Ciara Pike-Burke Imperial College London London, UK c.pikeburke@gmail.com |
| Pseudocode | No | The paper describes mathematical equations and algorithms conceptually but does not provide a formal pseudocode block or algorithm listing. |
| Open Source Code | No | The paper does not contain any statement about releasing open-source code or a link to a code repository. |
| Open Datasets | No | The paper is theoretical and does not describe experiments that use datasets; thus, there is no mention of dataset availability for training. |
| Dataset Splits | No | The paper is theoretical and does not describe experiments that use datasets; thus, there is no information about training/validation/test splits. |
| Hardware Specification | No | The paper is theoretical and does not describe any experimental setup or the hardware used. |
| Software Dependencies | No | The paper is theoretical and does not describe any experimental setup or software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe any experimental setup, hyperparameters, or training configurations. |