Universal Morphology Control via Contextual Modulation

Authors: Zheng Xiong, Jacob Beck, Shimon Whiteson

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that our method not only improves learning performance on a diverse set of training robots, but also generalizes better to unseen morphologies in a zero-shot fashion.
Researcher Affiliation Academia 1Department of Computer Science, University of Oxford, Oxford, United Kingdom. Correspondence to: Zheng Xiong <zheng.xiong@cs.ox.ac.uk>.
Pseudocode No The paper does not include any pseudocode or algorithm blocks.
Open Source Code Yes The code is publicly available at https://github.com/ Master Xiong/ModuMorph.
Open Datasets Yes We experiment on the UNIMAL task set as used in Meta Morph (Gupta et al., 2022), which includes 100 training robots and 100 test robots with diverse morphologies (Gupta et al., 2021).
Dataset Splits No The paper mentions '100 training robots and 100 test robots' and discusses validating the method's effectiveness, but it does not specify a distinct 'validation' dataset split for model tuning or early stopping criteria applied to a validation set during training.
Hardware Specification No The acknowledgements section mentions 'a generous equipment grant from NVIDIA' but does not specify any particular GPU model (e.g., A100, V100), CPU, or other hardware components used for the experiments.
Software Dependencies No The paper mentions using 'PPO (Schulman et al., 2017) as the optimization algorithm' but does not specify the version of PPO implementation or any other software libraries with their version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes We train for 100M steps in FT, Incline, and Exploration, and 200M steps in VT and Obstacles, as they are more challenging to solve due to variable terrains. We run three random seeds for each method in each environment, and report the average performance and standard deviation. Following the same setup as in Meta Morph, we use PPO (Schulman et al., 2017) as the optimization algorithm. Similar to previous works (Dossa et al., 2021; Sun et al., 2022), we notice that the early stopping threshold has a significant influence on PPO performance (see Appendix B). We thus tune this hyperparameter over the candidate set of {0.03, 0.05} for each method in each environment. All the remaining hyperparameters follow the same setup as in Meta Morph for a fair comparison.