Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Wasserstein Gradient Flows for Optimizing Gaussian Mixture Policies

Authors: Hanna Ziesche, Leonel Rozo

NeurIPS 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our approach on common robotic settings: Reaching motions, collision-avoidance behaviors, and multi-goal tasks. Our results show that our method outperforms common policy optimization baselines in terms of task success rate and low-variance solutions.
Researcher Affiliation	Industry	Hanna Ziesche and Leonel Rozo Bosch Center for Artificial Intelligence (BCAI) Renningen, Germany EMAIL
Pseudocode	Yes	Algorithm 1 GMM Policy Optimization via Wasserstein Gradient Flows
Open Source Code	No	Explanation: The paper mentions using and extending existing libraries (Pymanopt) and using implementations for baselines (Stable-Baselines3, PMOE), but it does not provide an explicit statement or link for the open-source code of its own proposed methodology.
Open Datasets	No	Explanation: The paper states that initial policies were "learned from human demonstrations collected on a Python GUI" but does not provide any concrete access information (link, DOI, specific citation) for this dataset, nor does it specify using a well-known public dataset.
Dataset Splits	No	Explanation: The paper describes data collection as "human demonstrations" and mentions "policy rollouts", but it does not provide specific details on how this data was split into training, validation, or test sets (e.g., exact percentages, sample counts, or references to predefined splits).
Hardware Specification	No	Explanation: The paper mentions the robot model used for tasks ("7-Do F Franka Emika Panda robot") and that experiments were run in a "virtual environment" but does not provide specific details about the computing hardware (e.g., GPU/CPU models, memory, or cluster specifications) used for running the experiments or training the models.
Software Dependencies	No	Explanation: The paper lists several software tools and libraries used (e.g., Pymanopt, POT, Stable-Baselines3, Optuna, Robotics Toolbox for Python) but does not provide specific version numbers for these dependencies, which are necessary for full reproducibility.
Experiment Setup	Yes	Each optimization episode comprised 10 rollouts, each of maximum horizon length of 200 iterations. Convergence is achieved when a minimum average position error w.r.t the target computed over an episode is reached. (Reaching Task); Each episode consisted of 10 rollouts, each of maximum horizon length of 150 iterations. Convergence is determined by a minimum average position error w.r.t the target computed over an episode. (Collision-avoidance Task); An episode comprised 10 rollouts, each of maximum horizon length of 200 iterations. Again, the policy optimization converges when the average position error w.r.t the chosen target reaches a minimum threshold. (Multiple-goal Task); The GMM models were initially trained via classical Expectation-Maximization.; The policy rollout consisted of sampling a velocity action at π(at\|st) using Eq. 9, and subsequently commanding the robot via a Cartesian velocity controller at a frequency of 100Hz.; We tuned the baselines separately for each task using Optuna [70].