Reward-Free Exploration for Reinforcement Learning
Authors: Chi Jin, Akshay Krishnamurthy, Max Simchowitz, Tiancheng Yu
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We give an efficient algorithm that conducts O(S2Apoly(H)/ϵ2) episodes of exploration and returns ϵ-suboptimal policies for an arbitrary number of reward functions. We also give a nearly-matching Ω(S2AH2/ϵ2) lower bound, demonstrating the near-optimality of our algorithm in this setting. |
| Researcher Affiliation | Collaboration | 1Princeton University 2Microsoft Research, New York 3University of California, Berkeley 4Massachusetts Institute of Technology. |
| Pseudocode | Yes | Algorithm 2 Reward-free RL-Explore, Algorithm 3 Reward-free RL-Plan, Algorithm 4 Natural Policy Gradient (NPG) |
| Open Source Code | No | The paper does not provide any statement or link regarding the availability of open-source code for the methodology described. |
| Open Datasets | No | The paper is theoretical and focuses on algorithm design and theoretical bounds for reinforcement learning in MDPs. It does not describe experiments that involve training on a specific, publicly available dataset. |
| Dataset Splits | No | The paper is theoretical and does not describe empirical experiments or specific dataset splits (training, validation, test) needed for reproduction. |
| Hardware Specification | No | The paper is theoretical and does not describe any specific hardware used for running experiments. |
| Software Dependencies | No | The paper is theoretical and does not provide specific software dependencies or version numbers needed to replicate any experimental setup. |
| Experiment Setup | No | The paper is theoretical and does not describe specific experimental setup details such as hyperparameter values or training configurations. |