Offline Reinforcement Learning with Differentiable Function Approximation is Provably Efficient
Authors: Ming Yin, Mengdi Wang, Yu-Xiang Wang
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We show offline RL with differentiable function approximation is provably efficient by analyzing the pessimistic fitted Q-learning (PFQL) algorithm, and our results provide the theoretical basis for understanding a variety of practical heuristics that rely on Fitted Q-Iteration style design. In addition, we further improve our guarantee with a tighter instance-dependent characterization. We hope our work could draw interest in studying reinforcement learning with differentiable function approximation beyond the scope of current research. |
| Researcher Affiliation | Academia | Department of Computer Science Department of Electrical and Computer Engineering University of California, Santa Barbara Princeton University |
| Pseudocode | Yes | Algorithm 1 Pessimistic Fitted Q-Learning (PFQL) Algorithm 2 Vanilla Fitted Q-Learning (VFQL) Algorithm 3 Variance-Aware Fitted Q Learning (VAFQL) |
| Open Source Code | No | The paper does not provide any statement about making its source code available or links to a code repository. |
| Open Datasets | No | This is a theoretical paper and does not describe empirical experiments or the use of any datasets for training. |
| Dataset Splits | No | This is a theoretical paper and does not describe empirical experiments or data splits for training, validation, or testing. |
| Hardware Specification | No | This is a theoretical paper and does not describe empirical experiments, therefore, it does not mention any hardware specifications. |
| Software Dependencies | No | This is a theoretical paper and does not describe empirical experiments, therefore, it does not list any software dependencies with specific version numbers. |
| Experiment Setup | No | This is a theoretical paper and does not describe empirical experiments, therefore, it does not provide details about an experimental setup or hyperparameters. |