reproducibilityindex.ai

Learning transferable motor skills with hierarchical latent mixture policies

Authors: Dushyant Rao, Fereshteh Sadeghi, Leonard Hasenclever, Markus Wulfmeier, Martina Zambelli, Giulia Vezzani, Dhruva Tirumala, Yusuf Aytar, Josh Merel, Nicolas Heess, raia hadsell

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate in manipulation domains that the method can effectively cluster ofﬂine data into distinct, executable behaviours, while retaining the ﬂexibility of a continuous latent variable model. The resulting skills can be transferred and ﬁne-tuned on new tasks, unseen objects, and from state to vision-based policies, yielding better sample efﬁciency and asymptotic performance compared to existing skilland imitation-based methods. We further analyse how and when the skills are most beneﬁcial: they encourage directed exploration to cover large regions of the state space relevant to the task, making them most effective in challenging sparse-reward settings.
Researcher Affiliation	Industry	Dushyant Rao , Fereshteh Sadeghi, Leonard Hasenclever, Markus Wulfmeier, Martina Zambelli, Giulia Vezzani, Dhruva Tirumala, Yusuf Aytar, Josh Merel , Nicolas Heess, & Raia Hadsell Deep Mind, London, UK
Pseudocode	No	No pseudocode or clearly labeled algorithm blocks were found in the paper. The methodology is described through text, graphical models, and mathematical equations.
Open Source Code	No	The paper does not provide an explicit statement about releasing source code or a link to a code repository.
Open Datasets	Yes	We focus on manipulation tasks, using a Mu Jo Co-based environment with a single Sawyer arm, and three objects coloured red, green, and blue. We follow the challenging object stacking benchmark of Lee et al. (2021), which speciﬁes ﬁve object sets (Figure 2)... To evaluate our approach and baselines in the manipulation settings, we use two datasets: red_on_blue_stacking: this data is collected by an agent trained to stack the red object on the blue object and ignore the green one, for the simplest object set, set4. all_pairs_stacking: similar to the previous case, but with all six pairwise stacking combinations of {red, green, blue}, and covering all of the ﬁve object sets.
Dataset Splits	No	The paper mentions using 'red_on_blue_stacking' and 'all_pairs_stacking' datasets for offline learning and subsequent transfer experiments, but does not provide specific percentages or counts for training, validation, and test splits. It implies using the full dataset for offline learning and then evaluating on various transfer scenarios without explicitly defining traditional splits.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. It mentions using a 'Mu Jo Co-based environment' but not the underlying hardware that runs the simulation or experiments.
Software Dependencies	No	The paper mentions software components such as 'MPO' (Maximum a posteriori Policy Optimisation), 'RHPO' (Hindsight Off-policy Option Learning), 'Mu Jo Co' (a physics engine), and 'Res Net' (a neural network architecture), but does not provide specific version numbers for any of them, which is required for reproducible software dependency information.
Experiment Setup	Yes	The network architecture details and hyperparameters for He LMS are shown in Table 5. Parameter sweeps were performed for the β coefﬁcients during ofﬂine learning and the η coefﬁcients during RL. Small sweeps were also performed for the RHPO ϵ parameters... All RL experiments were run with 3 seeds to capture variation in each method. For network architectures, all experiments except for vision used simple 2-layer MLPs... Table 5: Hyperparameters and architecture details for He LMS, for both ofﬂine training and RL.