reproducibilityindex.ai

RIME: Robust Preference-based Reinforcement Learning with Noisy Preferences

Authors: Jie Cheng, Gang Xiong, Xingyuan Dai, Qinghai Miao, Yisheng Lv, Fei-Yue Wang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments on robotic manipulation and locomotion tasks demonstrate that RIME significantly enhances the robustness of the state-of-the-art Pb RL method. Code is available at https://github.com/CJReinforce/ RIME_ICML2024. Our experimental results indicate that RIME significantly outperforms existing baselines under noisy preference conditions, thereby substantially enhancing robustness for Pb RL.
Researcher Affiliation	Academia	1State Key Laboratory of Multimodal Artificial Intelligence Systems, CASIA 2School of Artificial Intelligence, the University of Chinese Academy of Sciences. Correspondence to: Yisheng Lv <yisheng.lv@ia.ac.cn>.
Pseudocode	Yes	The full procedure of RIME is detailed in Appendix A. Algorithm 1 RIME
Open Source Code	Yes	Code is available at https://github.com/CJReinforce/ RIME_ICML2024.
Open Datasets	Yes	We evaluate RIME on six complex tasks, including robotic manipulation tasks from Meta-world (Yu et al., 2020) and locomotion tasks from DMControl (Tassa et al., 2018; 2020).
Dataset Splits	No	The paper discusses 'pre-training' and 'online training' phases, and mentions 'validation' as a concept in the context of the process (e.g., 'validation' of models), but it does not provide specific numerical or percentage splits for training, validation, or test datasets.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types, memory) used to run the experiments. It refers to 'robotic manipulation and locomotion tasks' which implies simulation, but no hardware details are given.
Software Dependencies	No	The paper mentions software components and algorithms (e.g., 'SAC', 'PyTorch' implicitly through RL framework), but it does not provide specific version numbers for these software dependencies, which are required for full reproducibility.
Experiment Setup	Yes	Implementation Details. For the hyperparameters of RIME, we fix α = 0.5, βmin = 1 and βmax = 3 in the lower bound τlower, and fix the upper bound τupper = 3 ln(10) for all experiments. The decay rate k in τupper is 1/30 for DMControl tasks, and 1/300 for Meta-world tasks, respectively. Other hyperparameters are kept the same as PEBBLE. The paper also includes detailed hyperparameter tables (Table 9, 10, 11, 12, 13, 14) for SAC, PEBBLE, SURF, RUNE, MRN, and RIME.