Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Meta-Evolve: Continuous Robot Evolution for One-to-many Policy Transfer
Authors: Xingyu Liu, Deepak Pathak, Ding Zhao
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments have shown that our method is able to improve the efficiency of one-to-three transfer of manipulation policy by up to 3.2 and one-to-six transfer of agile locomotion policy by 2.4 in terms of simulation cost over the baseline of launching multiple independent one-to-one policy transfers. |
| Researcher Affiliation | Academia | Xingyu Liu, Deepak Pathak, Ding Zhao Carnegie Mellon University EMAIL |
| Pseudocode | Yes | Algorithm 1 Meta-Evolve Algorithm 2 Determination of Evolution Tree and Meta Robots |
| Open Source Code | No | Supplementary videos available at the project website: https://sites.google.com/view/meta-evolve. The paper explicitly mentions 'videos' at the project website but does not state that source code for the methodology is available there or elsewhere. |
| Open Datasets | Yes | We showcase our Meta-Evolve on three Hand Manipulation Suite manipulation tasks (Rajeswaran et al., 2018)... The source robot is the Ant-v2 robot used in Mu Jo Co Gym (Brockman et al., 2016)... The source expert policy is trained by learning from the human hand demonstrations in Dex YCB dataset (Chao et al., 2021). |
| Dataset Splits | No | The paper states a 'Success Rate Threshold for Moving to the Next Training Phase' of 66.7% and aims to 'reach 80% success rate on all three target robots'. However, it does not specify explicit dataset splits (e.g., 80/10/10) for training, validation, and testing data. |
| Hardware Specification | No | The paper mentions using PyTorch as the deep learning framework, NPG as the RL algorithm, and MuJoCo as the physics simulation engine. However, it does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | Yes | We use Py Torch (Paszke et al., 2019) as our deep learning framework and NPG (Rajeswaran et al., 2017) as the RL algorithm in all manipulation policy transfer and agile locomotion transfer experiments. We used Mu Jo Co (Todorov et al., 2012) as the physics simulation engine. |
| Experiment Setup | Yes | Hyperparameter Selection. We present the hyperparameters of our robot evolution and policy optimization in Table 4. Table 4 lists specific values for RL Discount Factor Ξ³, GAE, NPG Step Size, Policy Network Hidden Layer Sizes, Value Network Hidden Layer Sizes, Simulation Epoch Length, RL Traning Batch Size, Evolution Progression Step Size ΞΎ, Number of Sampled Evolution Parameter Vectors for Jacobian Estimation in HERD Runs, Evolution Direction Weighting Factor Ξ», Sample Range Shrink Ratio, Success Rate Threshold for Moving to the Next Training Phase. |