reproducibilityindex.ai

Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation

Authors: Pihe Hu, Yu Chen, Longbo Huang

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We study reinforcement learning with linear function approximation where the transition probability and reward functions are linear with respect to a feature mapping ϕps, aq. Specifically, we consider the episodic inhomogeneous linear Markov Decision Process (MDP), and propose a novel computation-efficient algorithm, LSVI-UCB , which achieves an r Op Hd ? Tq regret bound where H is the episode length, d is the feature dimension, and T is the number of steps. LSVI-UCB builds on weighted ridge regression and upper confidence value iteration with a Bernstein-type exploration bonus. Our statistical results are obtained with novel analytical tools, including a new Bernstein self-normalized bound with conservatism on elliptical potentials, and refined analysis of the correction term.
Researcher Affiliation	Academia	1Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China 2Department of Mathematical Sciences, Tsinghua University, Beijing, China.
Pseudocode	Yes	Algorithm 1 LSVI-UCB for Linear MDPs
Open Source Code	No	The paper does not contain any statements about making code publicly available or links to a code repository.
Open Datasets	No	The paper is theoretical and focuses on algorithm design and regret bounds for linear MDPs. It does not mention the use of any specific publicly available datasets for training or evaluation.
Dataset Splits	No	The paper is theoretical and does not involve empirical experiments with data splits. There is no mention of training, validation, or test dataset splits.
Hardware Specification	No	The paper describes a theoretical algorithm and its statistical properties. It does not mention any hardware specifications used for running experiments.
Software Dependencies	No	The paper is theoretical and focuses on algorithm design and proofs. It does not specify any software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and provides algorithmic details and proofs for regret bounds. It does not include an experimental setup section with hyperparameters or training settings.