Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Learning Near Optimal Policies with Low Inherent Bellman Error
Authors: Andrea Zanette, Alessandro Lazaric, Mykel Kochenderfer, Emma Brunskill
ICML 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We study the exploration problem with approximate linear action-value functions in episodic reinforcement learning under the notion of low inherent Bellman error... While computational tractability questions remain open for the MDP setting, this enriches the class of MDPs with a linear representation for the action-value function where statistically efficient reinforcement learning is possible. |
| Researcher Affiliation | Collaboration | 1Stanford University 2Facebook Artificial Intelligence Research. Correspondence to: Andrea Zanette <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 ELEANOR |
| Open Source Code | No | Although ELEANOR is proved to be near optimal, it is difficult to implement the algorithm efficiently. This should not be seen as a fundamental barrier, however... For now, we leave this to future work. |
| Open Datasets | No | The paper is theoretical and does not report on experiments with datasets, thus no information on training datasets is provided. |
| Dataset Splits | No | The paper is theoretical and does not report on experiments with datasets, thus no information on training/validation/test splits is provided. |
| Hardware Specification | No | The paper is theoretical and does not report on experiments or provide any hardware specifications. |
| Software Dependencies | No | The paper is theoretical and does not report on experiments or provide any specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not report on experiments or provide any experimental setup details such as hyperparameters. |