reproducibilityindex.ai

Neural Probabilistic Motor Primitives for Humanoid Control

Authors: Josh Merel, Leonard Hasenclever, Alexandre Galashov, Arun Ahuja, Vu Pham, Greg Wayne, Yee Whye Teh, Nicolas Heess

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that it is possible to train this model entirely ofﬂine to compress thousands of expert policies and learn a motor primitive embedding space. The trained neural probabilistic motor primitive system can perform one-shot imitation of whole-body humanoid behaviors, robustly mimicking unseen trajectories. Additionally, we demonstrate that it is also straightforward to train controllers to reuse the learned motor primitive space to solve tasks, and the resulting movements are relatively naturalistic. To support the training of our model, we compare two approaches for ofﬂine policy cloning...
Researcher Affiliation	Industry	Josh Merel , Leonard Hasenclever , Alexandre Galashov, Arun Ahuja, Vu Pham, Greg Wayne, Yee Whye Teh, & Nicolas Heess Deep Mind London, UK {jsmerel,leonardh,agalashov,arahuja,vuph, gregwayne,ywteh,heess}@google.com
Pseudocode	No	The paper describes algorithms and methods but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any statement about making its source code available or include a link to a code repository for the described methodology.
Open Datasets	Yes	We use the CMU Mocap database1, which contains more than 2000 clips...1The CMU motion capture database is available at mocap.cs.cmu.edu.
Dataset Splits	No	The paper mentions 'training set' and 'held-out clips' but does not specify explicit percentages or sample counts for training, validation, and test dataset splits.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU models, CPU types, or memory amounts used for running the experiments.
Software Dependencies	Yes	Here we use an off-policy RL algorithm, SVG(0) (Heess et al., 2015) with Retrace (Munos et al., 2016). ... We used the reparametrization trick (Kingma & Welling, 2013; Rezende et al., 2014) to train the model and used stochastic gradient descent with ADAM (Kingma & Ba, 2015) with a learning rate of 0.0001.
Experiment Setup	Yes	The decoder p(at\|st, zt) in our experiments was a MLP with three layers with 1024 hidden units... ﬁxed standard deviation of 0.1... The encoder q(zt\|zt 1, xt) in our experiments was also an MLP with two layers of 1024 hidden units each. ... In most of our experiments, we used a 60-dimensional latent space. ... learning rate of 0.0001. In the case of models trained on 100 trajectories per expert we used minibatches of 512 subsequences of length 30. For LFPC we sampled 32 subsequences of length 30 and produced 5 perturbed state sequences per subsequence.