AnyMorph: Learning Transferable Polices By Inferring Agent Morphology

Authors: Brandon Trabucco, Mariano Phielipp, Glen Berseth

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our approach on the standard benchmark for agent-agnostic control, and improve over the current state of the art in zero-shot generalization to new agents. Importantly, our method attains good performance without an explicit description of morphology.
Researcher Affiliation Collaboration Brandon Trabucco 1 Mariano Phielipp * 2 Glen Berseth * 3 1Machine Learning Department, Carnegie Mellon University, work done while at Intel AI 2Intel AI 3Mila.
Pseudocode No No explicit pseudocode or algorithm block is present in the paper. The methodology is described in prose and through diagrams.
Open Source Code Yes Additionally, we have released the source code for our method and summarized how the model works at the following site.
Open Datasets Yes To answer these questions, we leverage a benchmark for agent-agnostic reinforcement learning developed by Huang et al. (2020, p. 1). This benchmark contains a set of eight reinforcement learning tasks... The agents present in this benchmark and inspired by and derived from standard Open AI Gym tasks: Half Cheetah-v2, Walker2d-2, Hopper-v2, and Humanoid-v2 (Brockman et al., 2016).
Dataset Splits Yes To answer this question, we follow Kurin et al. (2021) and hold out 3 Cheetahs, 2 Walkers, and 2 Humanoids respectively. See Appendix C for which specific morphologies are used for testing.
Hardware Specification Yes Our model fits on a single Nvidia 2080ti GPU, and requires seven days of training to reach 3 million environments steps.
Software Dependencies No The paper mentions using TD3, Mu Jo Co-like agents, and Open AI Gym tasks, but does not specify software dependencies like programming language or library versions (e.g., Python version, PyTorch version) needed for reproducible setup.
Experiment Setup Yes We provide a table of hyperparameters in Appendix A for our policy and reinforcement learning optimizer.