Neural Network Approximation for Pessimistic Offline Reinforcement Learning

Authors: Di Wu, Yuling Jiao, Li Shen, Haizhao Yang, Xiliang Lu

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this paper, we establish a non-asymptotic estimation error of pessimistic offline RL using general neural network approximation with C-mixing data regarding the structure of networks, the dimension of datasets, and the concentrability of data coverage, under mild assumptions. Our result shows that the estimation error consists of two parts: the first converges to zero at a desired rate on the sample size with partially controllable concentrability, and the second becomes negligible if the residual constraint is tight. This result demonstrates the explicit efficiency of deep adversarial offline RL frameworks.
Researcher Affiliation Collaboration Di Wu1, Yuling Jiao1, 2, Li Shen3, Haizhao Yang4*, Xiliang Lu1, 2 * 1 School of Mathematics and Statistics, Wuhan University, China, 2 Hubei Key Laboratory of Computational Science, Wuhan University, China, 3 JD Explore Academy, China, 4 Department of Mathematics and Department of Computer Science, University of Maryland, College Park, USA
Pseudocode No The paper does not contain any sections or figures labeled 'Pseudocode' or 'Algorithm', nor any structured code-like blocks.
Open Source Code No The paper does not include any explicit statement about releasing source code for the methodology or provide any links to a code repository.
Open Datasets No The paper is theoretical and does not describe or use a specific dataset for training. It discusses data in a general, theoretical context (e.g., 'C-mixing data') without providing access information for any particular dataset.
Dataset Splits No The paper is theoretical and does not describe or use specific training/validation/test dataset splits needed for reproducibility of empirical experiments.
Hardware Specification No The paper is theoretical and does not report on experimental work that would require specific hardware. Thus, no hardware specifications are provided.
Software Dependencies No The paper is theoretical and focuses on mathematical derivations, thus it does not mention any software dependencies with specific version numbers.
Experiment Setup No The paper is theoretical and does not conduct experiments; therefore, it does not provide details on experimental setup, hyperparameters, or system-level training settings.