Near-Optimal Reward-Free Exploration for Linear Mixture MDPs with Plug-in Solver

Authors: Xiaoyu Chen, Jiachen Hu, Lin Yang, Liwei Wang

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We show that, by establishing a novel exploration algorithm, the plug-in approach learns a model by taking O(d2H3/ϵ2) episodes with the environment and any ϵ-optimal planner on the model gives an O(ϵ)-optimal policy on the original model. This sample complexity matches our lower bound for non-plug-in approaches and is statistically optimal.
Researcher Affiliation Academia Xiaoyu Chen & Jiachen Hu Key Laboratory of Machine Perception, MOE, School of Artificial Intelligence, Peking University {cxy30, Nick H}@pku.edu.cn Lin F. Yang Electrical and Computer Engineering Department, University of California, Los Angeles linyang@ee.ucla.edu Liwei Wang * Key Laboratory of Machine Perception, MOE, School of Artificial Intelligence, Peking University International Center for Machine Learning Research, Peking University wanglw@cis.pku.edu.cn
Pseudocode Yes Algorithm 1 Reward-free Exploration: Exploration Phase Algorithm 2 Reward-free Exploration: Exploration Phase Algorithm 3 Reward-free Exploration for Linear MDPs: Exploration Phase Algorithm 4 Reward-free Exploration for Linear MDPs: Planning Phase
Open Source Code No The paper does not provide any concrete access to source code or mention that code is being released.
Open Datasets No The paper is theoretical and does not describe experiments involving datasets. Therefore, no information about publicly available datasets is provided.
Dataset Splits No The paper is theoretical and does not describe experiments. Therefore, no information about training/validation/test splits is provided.
Hardware Specification No The paper is theoretical and does not describe empirical experiments. Therefore, no hardware specifications for running experiments are provided.
Software Dependencies No The paper is theoretical and does not describe empirical experiments. Therefore, no specific software dependencies with version numbers are provided.
Experiment Setup No The paper is theoretical and does not describe empirical experiments. Therefore, no experimental setup details like hyperparameters or training settings are provided.