reproducibilityindex.ai

PEAC: Unsupervised Pre-training for Cross-Embodiment Reinforcement Learning

Authors: Chengyang Ying, Hao Zhongkai, Xinning Zhou, Xuezhou Xu, Hang Su, Xingxing Zhang, Jun Zhu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments in both simulated (e.g., DMC and Robosuite) and real-world environments (e.g., legged locomotion) demonstrate that PEAC significantly improves adaptation performance and cross-embodiment generalization, demonstrating its effectiveness in overcoming the unique challenges of CEURL.
Researcher Affiliation	Collaboration	1Department of Computer Science & Technology, Institute for AI, BNRist Center, Tsinghua-Bosch Joint ML Center, THBI Lab, Tsinghua University 2Pazhou Lab (Huangpu), Guangzhou, China
Pseudocode	Yes	PEAC s pseudo-code is in Appendix C.
Open Source Code	Yes	The project page and code are in https://yingchengyang.github.io/ceurl.
Open Datasets	Yes	In simulations, we choose state-based / image-based Deep Mind Control Suite (DMC) environments extending Unsupervised RL Benchmark (URLB) [27] and different robotic arms in Robosuite [73]. Under these settings, PEAC demonstrates superior few-shot learning ability to downstream tasks, and remarkable generalization ability to unseen embodiments, surpassing existing state-of-the-art unsupervised RL models. Besides, we have evaluated PEAC in real-world Aliengo robots by considering practical joint failure settings based on Isaacgym [37], verifying PEAC s strong adaptability on different joint failures and various real-world terrains.
Dataset Splits	No	The paper specifies pre-training and fine-tuning steps, and evaluates on unseen embodiments, but does not explicitly mention a separate validation dataset split used for hyperparameter tuning or model selection during training.
Hardware Specification	Yes	In experiments, all the agents are trained by Ge Force RTX 2080 Ti with Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz.
Software Dependencies	No	The paper mentions specific software components like DDPG, Dreamer V2, Dr Q, PPO, Adam optimizer, and refers to codebases like URLB, but it does not specify exact version numbers for these software libraries or frameworks. For example, it refers to 'DDPG [33]' and 'Dreamer V2 [18]' but not 'PyTorch 1.x' or specific library versions used from their implementations.
Experiment Setup	Yes	In this section, we will introduce more detailed information about our experiments. ... Here we introduce PEAC s hyper-parameters. For all settings, hyper-parameters of RL backbones (DDPG, Dreamer V2, PPO) follow standard settings.