Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Neural Network Approximation for Pessimistic Offline Reinforcement Learning
Authors: Di Wu, Yuling Jiao, Li Shen, Haizhao Yang, Xiliang Lu
AAAI 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we establish a non-asymptotic estimation error of pessimistic offline RL using general neural network approximation with C-mixing data regarding the structure of networks, the dimension of datasets, and the concentrability of data coverage, under mild assumptions. Our result shows that the estimation error consists of two parts: the first converges to zero at a desired rate on the sample size with partially controllable concentrability, and the second becomes negligible if the residual constraint is tight. This result demonstrates the explicit efficiency of deep adversarial offline RL frameworks. |
| Researcher Affiliation | Collaboration | Di Wu1, Yuling Jiao1, 2, Li Shen3, Haizhao Yang4*, Xiliang Lu1, 2 * 1 School of Mathematics and Statistics, Wuhan University, China, 2 Hubei Key Laboratory of Computational Science, Wuhan University, China, 3 JD Explore Academy, China, 4 Department of Mathematics and Department of Computer Science, University of Maryland, College Park, USA |
| Pseudocode | No | The paper does not contain any sections or figures labeled 'Pseudocode' or 'Algorithm', nor any structured code-like blocks. |
| Open Source Code | No | The paper does not include any explicit statement about releasing source code for the methodology or provide any links to a code repository. |
| Open Datasets | No | The paper is theoretical and does not describe or use a specific dataset for training. It discusses data in a general, theoretical context (e.g., 'C-mixing data') without providing access information for any particular dataset. |
| Dataset Splits | No | The paper is theoretical and does not describe or use specific training/validation/test dataset splits needed for reproducibility of empirical experiments. |
| Hardware Specification | No | The paper is theoretical and does not report on experimental work that would require specific hardware. Thus, no hardware specifications are provided. |
| Software Dependencies | No | The paper is theoretical and focuses on mathematical derivations, thus it does not mention any software dependencies with specific version numbers. |
| Experiment Setup | No | The paper is theoretical and does not conduct experiments; therefore, it does not provide details on experimental setup, hyperparameters, or system-level training settings. |