PEAC: Unsupervised Pre-training for Cross-Embodiment Reinforcement Learning

Authors: Chengyang Ying, Hao Zhongkai, Xinning Zhou, Xuezhou Xu, Hang Su, Xingxing Zhang, Jun Zhu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments in both simulated (e.g., DMC and Robosuite) and real-world environments (e.g., legged locomotion) demonstrate that PEAC significantly improves adaptation performance and cross-embodiment generalization, demonstrating its effectiveness in overcoming the unique challenges of CEURL.
Researcher Affiliation Collaboration 1Department of Computer Science & Technology, Institute for AI, BNRist Center, Tsinghua-Bosch Joint ML Center, THBI Lab, Tsinghua University 2Pazhou Lab (Huangpu), Guangzhou, China
Pseudocode Yes PEAC s pseudo-code is in Appendix C.
Open Source Code Yes The project page and code are in https://yingchengyang.github.io/ceurl.
Open Datasets Yes In simulations, we choose state-based / image-based Deep Mind Control Suite (DMC) environments extending Unsupervised RL Benchmark (URLB) [27] and different robotic arms in Robosuite [73]. Under these settings, PEAC demonstrates superior few-shot learning ability to downstream tasks, and remarkable generalization ability to unseen embodiments, surpassing existing state-of-the-art unsupervised RL models. Besides, we have evaluated PEAC in real-world Aliengo robots by considering practical joint failure settings based on Isaacgym [37], verifying PEAC s strong adaptability on different joint failures and various real-world terrains.
Dataset Splits No The paper specifies pre-training and fine-tuning steps, and evaluates on unseen embodiments, but does not explicitly mention a separate validation dataset split used for hyperparameter tuning or model selection during training.
Hardware Specification Yes In experiments, all the agents are trained by Ge Force RTX 2080 Ti with Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz.
Software Dependencies No The paper mentions specific software components like DDPG, Dreamer V2, Dr Q, PPO, Adam optimizer, and refers to codebases like URLB, but it does not specify exact version numbers for these software libraries or frameworks. For example, it refers to 'DDPG [33]' and 'Dreamer V2 [18]' but not 'PyTorch 1.x' or specific library versions used from their implementations.
Experiment Setup Yes In this section, we will introduce more detailed information about our experiments. ... Here we introduce PEAC s hyper-parameters. For all settings, hyper-parameters of RL backbones (DDPG, Dreamer V2, PPO) follow standard settings.