RIME: Robust Preference-based Reinforcement Learning with Noisy Preferences
Authors: Jie Cheng, Gang Xiong, Xingyuan Dai, Qinghai Miao, Yisheng Lv, Fei-Yue Wang
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on robotic manipulation and locomotion tasks demonstrate that RIME significantly enhances the robustness of the state-of-the-art Pb RL method. Code is available at https://github.com/CJReinforce/ RIME_ICML2024. Our experimental results indicate that RIME significantly outperforms existing baselines under noisy preference conditions, thereby substantially enhancing robustness for Pb RL. |
| Researcher Affiliation | Academia | 1State Key Laboratory of Multimodal Artificial Intelligence Systems, CASIA 2School of Artificial Intelligence, the University of Chinese Academy of Sciences. Correspondence to: Yisheng Lv <yisheng.lv@ia.ac.cn>. |
| Pseudocode | Yes | The full procedure of RIME is detailed in Appendix A. Algorithm 1 RIME |
| Open Source Code | Yes | Code is available at https://github.com/CJReinforce/ RIME_ICML2024. |
| Open Datasets | Yes | We evaluate RIME on six complex tasks, including robotic manipulation tasks from Meta-world (Yu et al., 2020) and locomotion tasks from DMControl (Tassa et al., 2018; 2020). |
| Dataset Splits | No | The paper discusses 'pre-training' and 'online training' phases, and mentions 'validation' as a concept in the context of the process (e.g., 'validation' of models), but it does not provide specific numerical or percentage splits for training, validation, or test datasets. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types, memory) used to run the experiments. It refers to 'robotic manipulation and locomotion tasks' which implies simulation, but no hardware details are given. |
| Software Dependencies | No | The paper mentions software components and algorithms (e.g., 'SAC', 'PyTorch' implicitly through RL framework), but it does not provide specific version numbers for these software dependencies, which are required for full reproducibility. |
| Experiment Setup | Yes | Implementation Details. For the hyperparameters of RIME, we fix α = 0.5, βmin = 1 and βmax = 3 in the lower bound τlower, and fix the upper bound τupper = 3 ln(10) for all experiments. The decay rate k in τupper is 1/30 for DMControl tasks, and 1/300 for Meta-world tasks, respectively. Other hyperparameters are kept the same as PEBBLE. The paper also includes detailed hyperparameter tables (Table 9, 10, 11, 12, 13, 14) for SAC, PEBBLE, SURF, RUNE, MRN, and RIME. |