Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Offline Reinforcement Learning with Differentiable Function Approximation is Provably Efficient
Authors: Ming Yin, Mengdi Wang, Yu-Xiang Wang
ICLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We show offline RL with differentiable function approximation is provably efficient by analyzing the pessimistic fitted Q-learning (PFQL) algorithm, and our results provide the theoretical basis for understanding a variety of practical heuristics that rely on Fitted Q-Iteration style design. In addition, we further improve our guarantee with a tighter instance-dependent characterization. We hope our work could draw interest in studying reinforcement learning with differentiable function approximation beyond the scope of current research. |
| Researcher Affiliation | Academia | Department of Computer Science Department of Electrical and Computer Engineering University of California, Santa Barbara Princeton University |
| Pseudocode | Yes | Algorithm 1 Pessimistic Fitted Q-Learning (PFQL) Algorithm 2 Vanilla Fitted Q-Learning (VFQL) Algorithm 3 Variance-Aware Fitted Q Learning (VAFQL) |
| Open Source Code | No | The paper does not provide any statement about making its source code available or links to a code repository. |
| Open Datasets | No | This is a theoretical paper and does not describe empirical experiments or the use of any datasets for training. |
| Dataset Splits | No | This is a theoretical paper and does not describe empirical experiments or data splits for training, validation, or testing. |
| Hardware Specification | No | This is a theoretical paper and does not describe empirical experiments, therefore, it does not mention any hardware specifications. |
| Software Dependencies | No | This is a theoretical paper and does not describe empirical experiments, therefore, it does not list any software dependencies with specific version numbers. |
| Experiment Setup | No | This is a theoretical paper and does not describe empirical experiments, therefore, it does not provide details about an experimental setup or hyperparameters. |