Near-Optimal Reward-Free Exploration for Linear Mixture MDPs with Plug-in Solver
Authors: Xiaoyu Chen, Jiachen Hu, Lin Yang, Liwei Wang
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We show that, by establishing a novel exploration algorithm, the plug-in approach learns a model by taking O(d2H3/ϵ2) episodes with the environment and any ϵ-optimal planner on the model gives an O(ϵ)-optimal policy on the original model. This sample complexity matches our lower bound for non-plug-in approaches and is statistically optimal. |
| Researcher Affiliation | Academia | Xiaoyu Chen & Jiachen Hu Key Laboratory of Machine Perception, MOE, School of Artificial Intelligence, Peking University {cxy30, Nick H}@pku.edu.cn Lin F. Yang Electrical and Computer Engineering Department, University of California, Los Angeles linyang@ee.ucla.edu Liwei Wang * Key Laboratory of Machine Perception, MOE, School of Artificial Intelligence, Peking University International Center for Machine Learning Research, Peking University wanglw@cis.pku.edu.cn |
| Pseudocode | Yes | Algorithm 1 Reward-free Exploration: Exploration Phase Algorithm 2 Reward-free Exploration: Exploration Phase Algorithm 3 Reward-free Exploration for Linear MDPs: Exploration Phase Algorithm 4 Reward-free Exploration for Linear MDPs: Planning Phase |
| Open Source Code | No | The paper does not provide any concrete access to source code or mention that code is being released. |
| Open Datasets | No | The paper is theoretical and does not describe experiments involving datasets. Therefore, no information about publicly available datasets is provided. |
| Dataset Splits | No | The paper is theoretical and does not describe experiments. Therefore, no information about training/validation/test splits is provided. |
| Hardware Specification | No | The paper is theoretical and does not describe empirical experiments. Therefore, no hardware specifications for running experiments are provided. |
| Software Dependencies | No | The paper is theoretical and does not describe empirical experiments. Therefore, no specific software dependencies with version numbers are provided. |
| Experiment Setup | No | The paper is theoretical and does not describe empirical experiments. Therefore, no experimental setup details like hyperparameters or training settings are provided. |