reproducibilityindex.ai

Offline Reinforcement Learning with Differentiable Function Approximation is Provably Efficient

Authors: Ming Yin, Mengdi Wang, Yu-Xiang Wang

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We show ofﬂine RL with differentiable function approximation is provably efﬁcient by analyzing the pessimistic ﬁtted Q-learning (PFQL) algorithm, and our results provide the theoretical basis for understanding a variety of practical heuristics that rely on Fitted Q-Iteration style design. In addition, we further improve our guarantee with a tighter instance-dependent characterization. We hope our work could draw interest in studying reinforcement learning with differentiable function approximation beyond the scope of current research.
Researcher Affiliation	Academia	Department of Computer Science Department of Electrical and Computer Engineering University of California, Santa Barbara Princeton University
Pseudocode	Yes	Algorithm 1 Pessimistic Fitted Q-Learning (PFQL) Algorithm 2 Vanilla Fitted Q-Learning (VFQL) Algorithm 3 Variance-Aware Fitted Q Learning (VAFQL)
Open Source Code	No	The paper does not provide any statement about making its source code available or links to a code repository.
Open Datasets	No	This is a theoretical paper and does not describe empirical experiments or the use of any datasets for training.
Dataset Splits	No	This is a theoretical paper and does not describe empirical experiments or data splits for training, validation, or testing.
Hardware Specification	No	This is a theoretical paper and does not describe empirical experiments, therefore, it does not mention any hardware specifications.
Software Dependencies	No	This is a theoretical paper and does not describe empirical experiments, therefore, it does not list any software dependencies with specific version numbers.
Experiment Setup	No	This is a theoretical paper and does not describe empirical experiments, therefore, it does not provide details about an experimental setup or hyperparameters.