Offline Reinforcement Learning with Differentiable Function Approximation is Provably Efficient

Authors: Ming Yin, Mengdi Wang, Yu-Xiang Wang

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We show offline RL with differentiable function approximation is provably efficient by analyzing the pessimistic fitted Q-learning (PFQL) algorithm, and our results provide the theoretical basis for understanding a variety of practical heuristics that rely on Fitted Q-Iteration style design. In addition, we further improve our guarantee with a tighter instance-dependent characterization. We hope our work could draw interest in studying reinforcement learning with differentiable function approximation beyond the scope of current research.
Researcher Affiliation Academia Department of Computer Science Department of Electrical and Computer Engineering University of California, Santa Barbara Princeton University
Pseudocode Yes Algorithm 1 Pessimistic Fitted Q-Learning (PFQL) Algorithm 2 Vanilla Fitted Q-Learning (VFQL) Algorithm 3 Variance-Aware Fitted Q Learning (VAFQL)
Open Source Code No The paper does not provide any statement about making its source code available or links to a code repository.
Open Datasets No This is a theoretical paper and does not describe empirical experiments or the use of any datasets for training.
Dataset Splits No This is a theoretical paper and does not describe empirical experiments or data splits for training, validation, or testing.
Hardware Specification No This is a theoretical paper and does not describe empirical experiments, therefore, it does not mention any hardware specifications.
Software Dependencies No This is a theoretical paper and does not describe empirical experiments, therefore, it does not list any software dependencies with specific version numbers.
Experiment Setup No This is a theoretical paper and does not describe empirical experiments, therefore, it does not provide details about an experimental setup or hyperparameters.