Neural Probabilistic Motor Primitives for Humanoid Control

Authors: Josh Merel, Leonard Hasenclever, Alexandre Galashov, Arun Ahuja, Vu Pham, Greg Wayne, Yee Whye Teh, Nicolas Heess

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that it is possible to train this model entirely offline to compress thousands of expert policies and learn a motor primitive embedding space. The trained neural probabilistic motor primitive system can perform one-shot imitation of whole-body humanoid behaviors, robustly mimicking unseen trajectories. Additionally, we demonstrate that it is also straightforward to train controllers to reuse the learned motor primitive space to solve tasks, and the resulting movements are relatively naturalistic. To support the training of our model, we compare two approaches for offline policy cloning...
Researcher Affiliation Industry Josh Merel , Leonard Hasenclever , Alexandre Galashov, Arun Ahuja, Vu Pham, Greg Wayne, Yee Whye Teh, & Nicolas Heess Deep Mind London, UK {jsmerel,leonardh,agalashov,arahuja,vuph, gregwayne,ywteh,heess}@google.com
Pseudocode No The paper describes algorithms and methods but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statement about making its source code available or include a link to a code repository for the described methodology.
Open Datasets Yes We use the CMU Mocap database1, which contains more than 2000 clips...1The CMU motion capture database is available at mocap.cs.cmu.edu.
Dataset Splits No The paper mentions 'training set' and 'held-out clips' but does not specify explicit percentages or sample counts for training, validation, and test dataset splits.
Hardware Specification No The paper does not provide any specific hardware details such as GPU models, CPU types, or memory amounts used for running the experiments.
Software Dependencies Yes Here we use an off-policy RL algorithm, SVG(0) (Heess et al., 2015) with Retrace (Munos et al., 2016). ... We used the reparametrization trick (Kingma & Welling, 2013; Rezende et al., 2014) to train the model and used stochastic gradient descent with ADAM (Kingma & Ba, 2015) with a learning rate of 0.0001.
Experiment Setup Yes The decoder p(at|st, zt) in our experiments was a MLP with three layers with 1024 hidden units... fixed standard deviation of 0.1... The encoder q(zt|zt 1, xt) in our experiments was also an MLP with two layers of 1024 hidden units each. ... In most of our experiments, we used a 60-dimensional latent space. ... learning rate of 0.0001. In the case of models trained on 100 trajectories per expert we used minibatches of 512 subsequences of length 30. For LFPC we sampled 32 subsequences of length 30 and produced 5 perturbed state sequences per subsequence.