Wasserstein Gradient Flows for Optimizing Gaussian Mixture Policies
Authors: Hanna Ziesche, Leonel Rozo
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our approach on common robotic settings: Reaching motions, collision-avoidance behaviors, and multi-goal tasks. Our results show that our method outperforms common policy optimization baselines in terms of task success rate and low-variance solutions. |
| Researcher Affiliation | Industry | Hanna Ziesche and Leonel Rozo Bosch Center for Artificial Intelligence (BCAI) Renningen, Germany name.surname@de.bosch.com |
| Pseudocode | Yes | Algorithm 1 GMM Policy Optimization via Wasserstein Gradient Flows |
| Open Source Code | No | Explanation: The paper mentions using and extending existing libraries (Pymanopt) and using implementations for baselines (Stable-Baselines3, PMOE), but it does not provide an explicit statement or link for the open-source code of its own proposed methodology. |
| Open Datasets | No | Explanation: The paper states that initial policies were "learned from human demonstrations collected on a Python GUI" but does not provide any concrete access information (link, DOI, specific citation) for this dataset, nor does it specify using a well-known public dataset. |
| Dataset Splits | No | Explanation: The paper describes data collection as "human demonstrations" and mentions "policy rollouts", but it does not provide specific details on how this data was split into training, validation, or test sets (e.g., exact percentages, sample counts, or references to predefined splits). |
| Hardware Specification | No | Explanation: The paper mentions the robot model used for tasks ("7-Do F Franka Emika Panda robot") and that experiments were run in a "virtual environment" but does not provide specific details about the computing hardware (e.g., GPU/CPU models, memory, or cluster specifications) used for running the experiments or training the models. |
| Software Dependencies | No | Explanation: The paper lists several software tools and libraries used (e.g., Pymanopt, POT, Stable-Baselines3, Optuna, Robotics Toolbox for Python) but does not provide specific version numbers for these dependencies, which are necessary for full reproducibility. |
| Experiment Setup | Yes | Each optimization episode comprised 10 rollouts, each of maximum horizon length of 200 iterations. Convergence is achieved when a minimum average position error w.r.t the target computed over an episode is reached. (Reaching Task); Each episode consisted of 10 rollouts, each of maximum horizon length of 150 iterations. Convergence is determined by a minimum average position error w.r.t the target computed over an episode. (Collision-avoidance Task); An episode comprised 10 rollouts, each of maximum horizon length of 200 iterations. Again, the policy optimization converges when the average position error w.r.t the chosen target reaches a minimum threshold. (Multiple-goal Task); The GMM models were initially trained via classical Expectation-Maximization.; The policy rollout consisted of sampling a velocity action at π(at|st) using Eq. 9, and subsequently commanding the robot via a Cartesian velocity controller at a frequency of 100Hz.; We tuned the baselines separately for each task using Optuna [70]. |