reproducibilityindex.ai

Neural Network Approximation for Pessimistic Offline Reinforcement Learning

Authors: Di Wu, Yuling Jiao, Li Shen, Haizhao Yang, Xiliang Lu

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this paper, we establish a non-asymptotic estimation error of pessimistic offline RL using general neural network approximation with C-mixing data regarding the structure of networks, the dimension of datasets, and the concentrability of data coverage, under mild assumptions. Our result shows that the estimation error consists of two parts: the first converges to zero at a desired rate on the sample size with partially controllable concentrability, and the second becomes negligible if the residual constraint is tight. This result demonstrates the explicit efficiency of deep adversarial offline RL frameworks.
Researcher Affiliation	Collaboration	Di Wu1, Yuling Jiao1, 2, Li Shen3, Haizhao Yang4, Xiliang Lu1, 2 1 School of Mathematics and Statistics, Wuhan University, China, 2 Hubei Key Laboratory of Computational Science, Wuhan University, China, 3 JD Explore Academy, China, 4 Department of Mathematics and Department of Computer Science, University of Maryland, College Park, USA
Pseudocode	No	The paper does not contain any sections or figures labeled 'Pseudocode' or 'Algorithm', nor any structured code-like blocks.
Open Source Code	No	The paper does not include any explicit statement about releasing source code for the methodology or provide any links to a code repository.
Open Datasets	No	The paper is theoretical and does not describe or use a specific dataset for training. It discusses data in a general, theoretical context (e.g., 'C-mixing data') without providing access information for any particular dataset.
Dataset Splits	No	The paper is theoretical and does not describe or use specific training/validation/test dataset splits needed for reproducibility of empirical experiments.
Hardware Specification	No	The paper is theoretical and does not report on experimental work that would require specific hardware. Thus, no hardware specifications are provided.
Software Dependencies	No	The paper is theoretical and focuses on mathematical derivations, thus it does not mention any software dependencies with specific version numbers.
Experiment Setup	No	The paper is theoretical and does not conduct experiments; therefore, it does not provide details on experimental setup, hyperparameters, or system-level training settings.