Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Two-Steps Diffusion Policy for Robotic Manipulation via Genetic Denoising

Authors: Mateo Clémente, Leo Brunswic, Yang, Xuan Zhao, Yasser Khalil, Haoyu Lei, Amir Rasouli, Yinchuan Li

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our approach across 14 robotic manipulation tasks from D4RL and Robomimic, spanning multiple action horizons and inference budgets.
Researcher Affiliation	Industry	Mateo Clemente* Huawei Technologies Canada EMAIL Léo Maxime Brunswic* Huawei Technologies Canada EMAIL Rui Heng Yang Huawei Technologies Canada EMAIL Xuan Zhao Huawei Technologies Canada EMAIL Yasser H. Khalil Huawei Technologies Canada EMAIL Haoyu Lei Huawei EMAIL Amir Rasouli Huawei Technologies Canada EMAIL Yinchuan Li Huawei EMAIL
Pseudocode	Yes	Algorithm 1 Genetic Diffusion Policy Require: Diffusion Policy noise model ϵθ with schedule (αt)t [0,T ]. Stochastic denoising rule xtj 1 = D(xtj,j,ϵθ(xi tj,tj)). Oo D score φ(xi tj,tj,ϵθ(xi tj,tj)). Population size P. Survival number S. Denoising steps N. 1: Sample xi t N N(0,1) for i {1, ,P} 2: j N 3: while j 0 do 4: j j 1 5: Compute scores pi = φ(xi tj,tj,ϵθ(xi tj,tj)) 6: Select S element in {1, ,P} with (i1, ,i S) Multinomial(S,p1, ,p P ) 7: xi tj 1 D(xii%S tj ,j,ϵθ(xii%S tj ,tj)) 8: end while 9: Return x0 0
Open Source Code	No	Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: The work has been conducted under company policy preventing the authors from publishing codes in a timeframe compatible with publication.
Open Datasets	Yes	We evaluate our method on 14 manipulation tasks from D4RL [8] and Robomimic [9], covering 6 action horizons and 18 inference budgets, using up to 500 seeds per configuration.
Dataset Splits	No	We evaluate our method on 14 manipulation tasks from D4RL [8] and Robomimic [9], covering 6 action horizons and 18 inference budgets, using up to 500 seeds per configuration. All methods use the same UNet architecture with 65M parameters from the official Diffusion Policy (DP) implementation [17] to ensure fairness. Each Adroit configuration is evaluated on 100 seeds, and each Robomimic configuration on 500 seeds. Since publicly released Adroit checkpoints do not cover all action horizons, we retrained each diffusion policy using the DP pipeline...
Hardware Specification	Yes	We measure step-wise overhead on an RTX 3080, batching the population in a single forward pass per step.
Software Dependencies	No	All methods use the same UNet architecture with 65M parameters from the official Diffusion Policy (DP) implementation [17] to ensure fairness. ... Shortcut models [7] were re-implemented in Py Torch and trained via 10 random hyperparameter seeds per task horizon pair, keeping the best model. ... we retrained each diffusion policy using the DP pipeline with Adam W [18, 19], learning rate 10 4, weight decay 10 6, batch size 64, and 200 epochs.
Experiment Setup	Yes	We sweep over the following inference hyperparameters: Number of inference steps δ {1,2,...,10} {20,30,...,100}, Noise scaling factor γ {0.0,0.1,...,1.0}, Action horizon h A {24,48,76,100,152,200}, Sampling method: DDPM, GDP, DDIM or Shortcut. ... we retrained each diffusion policy using the DP pipeline with Adam W [18, 19], learning rate 10 4, weight decay 10 6, batch size 64, and 200 epochs.