Wasserstein Gradient Flows for Optimizing Gaussian Mixture Policies

Authors: Hanna Ziesche, Leonel Rozo

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our approach on common robotic settings: Reaching motions, collision-avoidance behaviors, and multi-goal tasks. Our results show that our method outperforms common policy optimization baselines in terms of task success rate and low-variance solutions.
Researcher Affiliation Industry Hanna Ziesche and Leonel Rozo Bosch Center for Artificial Intelligence (BCAI) Renningen, Germany name.surname@de.bosch.com
Pseudocode Yes Algorithm 1 GMM Policy Optimization via Wasserstein Gradient Flows
Open Source Code No Explanation: The paper mentions using and extending existing libraries (Pymanopt) and using implementations for baselines (Stable-Baselines3, PMOE), but it does not provide an explicit statement or link for the open-source code of its own proposed methodology.
Open Datasets No Explanation: The paper states that initial policies were "learned from human demonstrations collected on a Python GUI" but does not provide any concrete access information (link, DOI, specific citation) for this dataset, nor does it specify using a well-known public dataset.
Dataset Splits No Explanation: The paper describes data collection as "human demonstrations" and mentions "policy rollouts", but it does not provide specific details on how this data was split into training, validation, or test sets (e.g., exact percentages, sample counts, or references to predefined splits).
Hardware Specification No Explanation: The paper mentions the robot model used for tasks ("7-Do F Franka Emika Panda robot") and that experiments were run in a "virtual environment" but does not provide specific details about the computing hardware (e.g., GPU/CPU models, memory, or cluster specifications) used for running the experiments or training the models.
Software Dependencies No Explanation: The paper lists several software tools and libraries used (e.g., Pymanopt, POT, Stable-Baselines3, Optuna, Robotics Toolbox for Python) but does not provide specific version numbers for these dependencies, which are necessary for full reproducibility.
Experiment Setup Yes Each optimization episode comprised 10 rollouts, each of maximum horizon length of 200 iterations. Convergence is achieved when a minimum average position error w.r.t the target computed over an episode is reached. (Reaching Task); Each episode consisted of 10 rollouts, each of maximum horizon length of 150 iterations. Convergence is determined by a minimum average position error w.r.t the target computed over an episode. (Collision-avoidance Task); An episode comprised 10 rollouts, each of maximum horizon length of 200 iterations. Again, the policy optimization converges when the average position error w.r.t the chosen target reaches a minimum threshold. (Multiple-goal Task); The GMM models were initially trained via classical Expectation-Maximization.; The policy rollout consisted of sampling a velocity action at π(at|st) using Eq. 9, and subsequently commanding the robot via a Cartesian velocity controller at a frequency of 100Hz.; We tuned the baselines separately for each task using Optuna [70].