Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes
Authors: Jiafan He, Heyang Zhao, Dongruo Zhou, Quanquan Gu
ICML 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We study reinforcement learning (RL) with linear function approximation. For episodic timeinhomogeneous linear Markov decision processes (linear MDPs) whose transition probability can be parameterized as a linear function of a given feature mapping, we propose the first computationally efficient algorithm that achieves the nearly minimax optimal regret e O(d H3K)... Our work provides a complete answer to optimal RL with linear MDPs, and the developed algorithm and theoretical tools may be of independent interest. |
| Researcher Affiliation | Academia | 1Department of Computer Science, University of California, Los Angeles, CA 90095, USA. |
| Pseudocode | Yes | Algorithm 1 LSVI-UCB++ |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | No | The paper is theoretical and does not report on experimental evaluation using specific datasets, thus no publicly available dataset is mentioned or linked for training. |
| Dataset Splits | No | The paper is theoretical and does not report on experimental evaluation, thus no dataset splits for training, validation, or testing are provided. |
| Hardware Specification | No | The paper is theoretical and does not report on experimental evaluation, thus no hardware specifications for running experiments are provided. |
| Software Dependencies | No | The paper is theoretical and does not report on experimental evaluation or implementation details requiring specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not report on experimental evaluation, thus no details about experimental setup, hyperparameters, or training settings are provided. |