Meta-Evolve: Continuous Robot Evolution for One-to-many Policy Transfer
Authors: Xingyu Liu, Deepak Pathak, Ding Zhao
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments have shown that our method is able to improve the efficiency of one-to-three transfer of manipulation policy by up to 3.2 and one-to-six transfer of agile locomotion policy by 2.4 in terms of simulation cost over the baseline of launching multiple independent one-to-one policy transfers. |
| Researcher Affiliation | Academia | Xingyu Liu, Deepak Pathak, Ding Zhao Carnegie Mellon University {xingyul3,dpathak,dingzhao}@andrew.cmu.edu |
| Pseudocode | Yes | Algorithm 1 Meta-Evolve Algorithm 2 Determination of Evolution Tree and Meta Robots |
| Open Source Code | No | Supplementary videos available at the project website: https://sites.google.com/view/meta-evolve. The paper explicitly mentions 'videos' at the project website but does not state that source code for the methodology is available there or elsewhere. |
| Open Datasets | Yes | We showcase our Meta-Evolve on three Hand Manipulation Suite manipulation tasks (Rajeswaran et al., 2018)... The source robot is the Ant-v2 robot used in Mu Jo Co Gym (Brockman et al., 2016)... The source expert policy is trained by learning from the human hand demonstrations in Dex YCB dataset (Chao et al., 2021). |
| Dataset Splits | No | The paper states a 'Success Rate Threshold for Moving to the Next Training Phase' of 66.7% and aims to 'reach 80% success rate on all three target robots'. However, it does not specify explicit dataset splits (e.g., 80/10/10) for training, validation, and testing data. |
| Hardware Specification | No | The paper mentions using PyTorch as the deep learning framework, NPG as the RL algorithm, and MuJoCo as the physics simulation engine. However, it does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | Yes | We use Py Torch (Paszke et al., 2019) as our deep learning framework and NPG (Rajeswaran et al., 2017) as the RL algorithm in all manipulation policy transfer and agile locomotion transfer experiments. We used Mu Jo Co (Todorov et al., 2012) as the physics simulation engine. |
| Experiment Setup | Yes | Hyperparameter Selection. We present the hyperparameters of our robot evolution and policy optimization in Table 4. Table 4 lists specific values for RL Discount Factor γ, GAE, NPG Step Size, Policy Network Hidden Layer Sizes, Value Network Hidden Layer Sizes, Simulation Epoch Length, RL Traning Batch Size, Evolution Progression Step Size ξ, Number of Sampled Evolution Parameter Vectors for Jacobian Estimation in HERD Runs, Evolution Direction Weighting Factor λ, Sample Range Shrink Ratio, Success Rate Threshold for Moving to the Next Training Phase. |