Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Constrained Diffusers for Safe Planning and Control

Authors: Jichen Zhang, Liqun Zhao, Antonis Papachristodoulou, Jack Umenberger

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments in Maze2D, locomotion, and Py Bullet ball running tasks demonstrate that our proposed methods achieve constraint satisfaction with less computation time, and are competitive with existing methods in environments with static and time-varying constraints.
Researcher Affiliation Academia Jichen Zhang Liqun Zhao Antonis Papachristodoulou Jack Umenberger University of Oxford EMAIL
Pseudocode Yes The pseudo-code of the complete algorithm is shown below. Algorithm 1 Constrained Diffusers (use Primal-Dual as an example)
Open Source Code Yes The implementation can be found here. (...) Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: we provide codes in the supplemental material.
Open Datasets Yes For Maze2D, we used Maze2D-umaze-v1 and Maze2D-large-v1 from D4RL; for Mu Jo Co locomotion tasks, we used hopper-medium-expert-v2 from D4RL and swimmer-medium-expert-v2 from Minari; for Py Bullet ball running tasks, we used Safety Ball Run-v0 from DSRL.
Dataset Splits No All diffuser models in our experiments were trained from scratch using expert demonstration data. For Maze2D, we used Maze2D-umaze-v1 and Maze2D-large-v1 from D4RL; for Mu Jo Co locomotion tasks, we used hopper-medium-expert-v2 from D4RL and swimmer-medium-expert-v2 from Minari; for Py Bullet ball running tasks, we used Safety Ball Run-v0 from DSRL.
Hardware Specification Yes The experiments are all running on the machine with Intel Core i7-14700 28, 32GB Memory, NVIDIA Ge Force RTX 4060 and the operational system Ubuntu 24.04.
Software Dependencies No The paper mentions environments and frameworks like Maze2D, D4RL, Gymnasium Mujoco, Minari, Py Bullet, and DSRL but does not provide specific version numbers for any of these software dependencies.
Experiment Setup Yes The experimental details are summarized as follows: In Maze2D environment, the noise is approximated by a temporal U-Net [3]. The training settings and codes are borrowed from [3, 9]. For Maze2D-umaze tasks, we use a planning horizon of 128, 64 diffusion steps and additional 64 steps in reverse process. The constraint is formulated as x + y b, where (x, y) is the position state of the agent and b is the parameter of the boundary. The learning rate η for dual variables in primal-dual methods is 0.025 and the initial penalty ρ is 0.05. For Maze2D-large tasks, we use a planning horizon of 384, 256 diffusion steps and additional 200 steps in reverse process. The constraint is formulated as ( x x0 a )2 + ( y y0 b )2 1, where (x, y) is the position state of the agent and (x0, y0, a, b) are the parameters of the boundary. The learning rate η for dual variables in primal-dual methods is 1e 3 and the initial penalty ρ is 2.5e 4. In Gymnasium Mujoco and Py Bullet environment, the temporal U-Net architecture, training settings and codes are borrowed from [4]. For Hopper tasks, we use a planning horizon of 100, 200 diffusion steps and additional 100 steps in reverse process. The constraint is formulated as ω ωmax, where ω is the angular velocity of the thigh hinge. The learning rate η for dual variables in primal-dual methods is 0.1 and the initial penalty ρ is 0.01. The hyperparameter α in DCBFs is 0.85. For Swimmer, we use a planning horizon of 100, 200 diffusion steps and additional 100 steps in reverse process. The constraint is formulated as ω ωmax, where ω is the angular velocity of the second rotor. The learning rate η for dual variables in primal-dual methods is 0.5 and the initial penalty ρ is 0.01. The hyperparameter α in DCBFs is 0.85. For Safety Ball Running tasks, we use a planning horizon of 100, 200 diffusion steps and additional 100 steps in reverse process. The constraint is formulated as [x (x0 + vxτ)]2 + [y (y0 + vyτ)]2 (R + r)2, where (x, y) is the position state of the ball agent, (x0, y0) represents the position of the obstacle ball, (vx, vy) represents the velocity, R represents the radius of agent ball and r represents the radius of the obstacle ball. The learning rate η for dual variables in primal-dual methods is 0.1 and the initial penalty ρ is 0.01. The hyperparameter α in DCBFs ranges from 0.3 to 0.8.