Corruption-Robust Offline Reinforcement Learning with General Function Approximation

Authors: Chenlu Ye, Rui Yang, Quanquan Gu, Tong Zhang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Motivated by our theoretical findings, we present a practical offline RL algorithm with uncertainty weighting and demonstrate its efficacy under diverse data corruption scenarios. Our practical implementation achieves a 104% improvement over the previous state-of-the-art uncertainty-based offline RL algorithm under data corruption, demonstrating its potential for effective deployment in real-world applications. 5 Experiments Based on our theoretical results, we propose a practical implementation for CR-PEVI and verify its effectiveness on simulation tasks with corrupted offline data.
Researcher Affiliation Academia Chenlu Ye The Hong Kong University of Science and Technology cyeab@connect.ust.hk Rui Yang The Hong Kong University of Science and Technology ryangam@connect.ust.hk Quanquan Gu University of California, Los Angeles qgu@cs.ucla.edu Tong Zhang The Hong Kong University of Science and Technology tongzhang@ust.hk
Pseudocode Yes Algorithm 1 Uncertainty Weight Iteration... Algorithm 2 CR-PEVI
Open Source Code No The paper does not contain any explicit statement about providing open-source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets Yes We assess the performance of our approach using continuous control tasks from [15]... [15] Fu, J., Kumar, A., Nachum, O., Tucker, G., and Levine, S. (2020). D4rl: Datasets for deep data-driven reinforcement learning. ar Xiv preprint ar Xiv:2004.07219.
Dataset Splits No No explicit details on train/validation/test dataset splits (e.g., percentages, sample counts) or the use of cross-validation are provided in the paper.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory, cloud instance types) used to run the experiments.
Software Dependencies No The paper does not provide specific software dependencies, libraries, or solvers with version numbers.
Experiment Setup Yes The ensemble size K is set to 10 for all experiments. For evaluation, we report average returns with standard deviations over 10 random seeds. More implementation details are also provided in Appendix D.