Towards Robust Model-Based Reinforcement Learning Against Adversarial Corruption
Authors: Chenlu Ye, Jiafan He, Quanquan Gu, Tong Zhang
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We prove that CR-OMLE achieves a regret of O(T + C), where C denotes the cumulative corruption level after T episodes. We also prove a lower bound to show that the additive dependence on C is optimal. To the best of our knowledge, this is the first work on corruption-robust model-based RL algorithms with provable guarantees. |
| Researcher Affiliation | Academia | 1The Hong Kong University of Science and Technology. 2University of California, Los Angeles. 3University of Illinois Urbana-Champaign. |
| Pseudocode | Yes | Algorithm 1 Corruption-Robust Optimistic MLE (CR-OMLE), Algorithm 2 Corruption-Robust Pessimistic MLE (CR-PMLE), Algorithm 3 Uncertainty Weight Iteration |
| Open Source Code | No | The paper does not provide any statement about making its code open source or links to code repositories. |
| Open Datasets | No | The paper is theoretical and does not conduct experiments on specific datasets, nor does it provide any concrete access information for any datasets. |
| Dataset Splits | No | The paper is theoretical and does not involve experimental validation with dataset splits. It refers to 'train', 'validation', and 'test' as general concepts in RL, not as specific data partitions used in its own work. |
| Hardware Specification | No | The paper focuses on theoretical analysis and algorithm design and does not report on experiments that would require hardware specifications. |
| Software Dependencies | No | The paper is theoretical and does not report on experiments requiring specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and focuses on algorithm design and proofs, therefore it does not provide specific experimental setup details such as hyperparameter values or training configurations. |