MCP: Learning Composable Hierarchical Control with Multiplicative Compositional Policies
Authors: Xue Bin Peng, Michael Chang, Grace Zhang, Pieter Abbeel, Sergey Levine
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that MCP is able to extract composable skills for highly complex simulated characters from pre-training tasks, such as motion imitation, and then reuse these skills to solve challenging continuous control tasks, such as dribbling a soccer ball to a goal, and picking up an object and transporting it to a target location. |
| Researcher Affiliation | Academia | Xue Bin Peng, Michael Chang, Grace Zhang, Pieter Abbeel, Sergey Levine Department of Electrical Engineering and Computer Science University of California, Berkeley {xbpeng, mbchang, grace.zhang}@berkeley.edu pabbeel@cs.berkeley.edu svlevine@eecs.berkeley.edu |
| Pseudocode | Yes | Algorithm 1 MCP Pre-Training and Transfer |
| Open Source Code | No | 1Supplementary video: xbpeng.github.io/projects/MCP/. This URL points to a project/video page, not a direct code repository link. The paper does not explicitly state that source code for the methodology is released or available via a code repository. |
| Open Datasets | Yes | We use a motion imitation approach following Peng et al. [32]... The corpus of motion clips is comprised of different walking and turning motions. The environment is a variant of the standard Gym Ant environment [4]. SFU. Sfu motion capture database. http://mocap.cs.sfu.ca/ [38]. |
| Dataset Splits | No | No specific details on training/validation/test dataset splits (exact percentages, sample counts, or citations to predefined splits) were explicitly provided for all experiments, nor was cross-validation mentioned. |
| Hardware Specification | No | We would like to thank AWS, Google, and NVIDIA for providing computational resources. This statement is too general and does not specify particular hardware models (e.g., specific GPUs, CPUs, or cloud instance types). |
| Software Dependencies | No | The policies operate at 30Hz and are trained using proximal policy optimization (PPO) [37]. No specific software dependencies with version numbers (e.g., Python, PyTorch/TensorFlow, CUDA) are explicitly provided. |
| Experiment Setup | Yes | All experiments use a similar network architecture for the policy, as illustrated in Figure 3. Each policy is composed of k = 8 primitives. The gating function and primitives are modeled by separate networks that output w(s, g), µi:k(s), and Σi:k(s), which are then composed according to Equation 2 to produce the composite policy. The state describes the configuration of the character s body, with features consisting of the relative positions of each link with respect to the root, their rotations represented by quaternions, and their linear and angular velocities. Actions from the policy specify target rotations for PD controllers positioned at each joint. Target rotations for 3D spherical joints are parameterized using exponential maps. The policies operate at 30Hz and are trained using proximal policy optimization (PPO) [37]. |