Towards Robust Model-Based Reinforcement Learning Against Adversarial Corruption

Authors: Chenlu Ye, Jiafan He, Quanquan Gu, Tong Zhang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We prove that CR-OMLE achieves a regret of O(T + C), where C denotes the cumulative corruption level after T episodes. We also prove a lower bound to show that the additive dependence on C is optimal. To the best of our knowledge, this is the first work on corruption-robust model-based RL algorithms with provable guarantees.
Researcher Affiliation Academia 1The Hong Kong University of Science and Technology. 2University of California, Los Angeles. 3University of Illinois Urbana-Champaign.
Pseudocode Yes Algorithm 1 Corruption-Robust Optimistic MLE (CR-OMLE), Algorithm 2 Corruption-Robust Pessimistic MLE (CR-PMLE), Algorithm 3 Uncertainty Weight Iteration
Open Source Code No The paper does not provide any statement about making its code open source or links to code repositories.
Open Datasets No The paper is theoretical and does not conduct experiments on specific datasets, nor does it provide any concrete access information for any datasets.
Dataset Splits No The paper is theoretical and does not involve experimental validation with dataset splits. It refers to 'train', 'validation', and 'test' as general concepts in RL, not as specific data partitions used in its own work.
Hardware Specification No The paper focuses on theoretical analysis and algorithm design and does not report on experiments that would require hardware specifications.
Software Dependencies No The paper is theoretical and does not report on experiments requiring specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical and focuses on algorithm design and proofs, therefore it does not provide specific experimental setup details such as hyperparameter values or training configurations.