Corruption-Robust Offline Reinforcement Learning with General Function Approximation
Authors: Chenlu Ye, Rui Yang, Quanquan Gu, Tong Zhang
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Motivated by our theoretical findings, we present a practical offline RL algorithm with uncertainty weighting and demonstrate its efficacy under diverse data corruption scenarios. Our practical implementation achieves a 104% improvement over the previous state-of-the-art uncertainty-based offline RL algorithm under data corruption, demonstrating its potential for effective deployment in real-world applications. 5 Experiments Based on our theoretical results, we propose a practical implementation for CR-PEVI and verify its effectiveness on simulation tasks with corrupted offline data. |
| Researcher Affiliation | Academia | Chenlu Ye The Hong Kong University of Science and Technology cyeab@connect.ust.hk Rui Yang The Hong Kong University of Science and Technology ryangam@connect.ust.hk Quanquan Gu University of California, Los Angeles qgu@cs.ucla.edu Tong Zhang The Hong Kong University of Science and Technology tongzhang@ust.hk |
| Pseudocode | Yes | Algorithm 1 Uncertainty Weight Iteration... Algorithm 2 CR-PEVI |
| Open Source Code | No | The paper does not contain any explicit statement about providing open-source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We assess the performance of our approach using continuous control tasks from [15]... [15] Fu, J., Kumar, A., Nachum, O., Tucker, G., and Levine, S. (2020). D4rl: Datasets for deep data-driven reinforcement learning. ar Xiv preprint ar Xiv:2004.07219. |
| Dataset Splits | No | No explicit details on train/validation/test dataset splits (e.g., percentages, sample counts) or the use of cross-validation are provided in the paper. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory, cloud instance types) used to run the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies, libraries, or solvers with version numbers. |
| Experiment Setup | Yes | The ensemble size K is set to 10 for all experiments. For evaluation, we report average returns with standard deviations over 10 random seeds. More implementation details are also provided in Appendix D. |