Meta-Evolve: Continuous Robot Evolution for One-to-many Policy Transfer

Authors: Xingyu Liu, Deepak Pathak, Ding Zhao

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments have shown that our method is able to improve the efficiency of one-to-three transfer of manipulation policy by up to 3.2 and one-to-six transfer of agile locomotion policy by 2.4 in terms of simulation cost over the baseline of launching multiple independent one-to-one policy transfers.
Researcher Affiliation Academia Xingyu Liu, Deepak Pathak, Ding Zhao Carnegie Mellon University {xingyul3,dpathak,dingzhao}@andrew.cmu.edu
Pseudocode Yes Algorithm 1 Meta-Evolve Algorithm 2 Determination of Evolution Tree and Meta Robots
Open Source Code No Supplementary videos available at the project website: https://sites.google.com/view/meta-evolve. The paper explicitly mentions 'videos' at the project website but does not state that source code for the methodology is available there or elsewhere.
Open Datasets Yes We showcase our Meta-Evolve on three Hand Manipulation Suite manipulation tasks (Rajeswaran et al., 2018)... The source robot is the Ant-v2 robot used in Mu Jo Co Gym (Brockman et al., 2016)... The source expert policy is trained by learning from the human hand demonstrations in Dex YCB dataset (Chao et al., 2021).
Dataset Splits No The paper states a 'Success Rate Threshold for Moving to the Next Training Phase' of 66.7% and aims to 'reach 80% success rate on all three target robots'. However, it does not specify explicit dataset splits (e.g., 80/10/10) for training, validation, and testing data.
Hardware Specification No The paper mentions using PyTorch as the deep learning framework, NPG as the RL algorithm, and MuJoCo as the physics simulation engine. However, it does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies Yes We use Py Torch (Paszke et al., 2019) as our deep learning framework and NPG (Rajeswaran et al., 2017) as the RL algorithm in all manipulation policy transfer and agile locomotion transfer experiments. We used Mu Jo Co (Todorov et al., 2012) as the physics simulation engine.
Experiment Setup Yes Hyperparameter Selection. We present the hyperparameters of our robot evolution and policy optimization in Table 4. Table 4 lists specific values for RL Discount Factor γ, GAE, NPG Step Size, Policy Network Hidden Layer Sizes, Value Network Hidden Layer Sizes, Simulation Epoch Length, RL Traning Batch Size, Evolution Progression Step Size ξ, Number of Sampled Evolution Parameter Vectors for Jacobian Estimation in HERD Runs, Evolution Direction Weighting Factor λ, Sample Range Shrink Ratio, Success Rate Threshold for Moving to the Next Training Phase.