Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

DynaGuide: Steering Diffusion Polices with Active Dynamic Guidance

Authors: Maximilian Du, Shuran Song

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the performance and features of Dyna Guide against other steering approaches in a series of simulated and real experiments, showing an average steering success of 70% on a set of articulated CALVIN tasks and outperforming goal-conditioning by 5.4x when steered with low-quality objectives. We also successfully steer an off-the-shelf real robot policy to express preference for particular objects and even create novel behavior. Videos and other visualizations can be found on the project website: https://dynaguide.github.io
Researcher Affiliation Academia Maximilian Du Stanford University EMAIL Shuran Song Stanford University EMAIL
Pseudocode Yes Algorithm 1 Dyna Guide (Inference-Time) 1: Input: Guidance Conditions g+, g , Dynamics model hθ 2: Input: Action denoiser ϵϕ(a, o), current obs ot 3: a K Sample from N(0, I) 4: for k in K to 1 do Action Denoising 5: for i in 1 to M do Stochastic Sampling 6: ϵ ϵϕ(ak, ot) 7: d Eq. 2 8: ˆϵ(ak, ot) ϵ s 1 αk akd 9: if i < M then 10: ak Denoise ak using ˆϵ 11: else 12: ak 1 Denoise ak using ˆϵ 13: end if 14: end for 15: end for
Open Source Code Yes We will make code and collected data publicly available.
Open Datasets Yes To investigate these claims, we conduct five experiments using simulated CALVIN environment tasks [30] and three experiments on a real robot arm. [...] For the real-world experiments 4.5, we use open-source data collected on the UMI interface [6].
Dataset Splits Yes We use the CALVIN-D dataset to train the base policy. [...] We train the dynamics model on the CALVIN-ABCD dataset, which is the full data split provided by the benchmark. [...] We reset the robot randomly by sampling a starting pose in a validation set of trajectories.
Hardware Specification Yes All policies and dynamics models were trained on single RTX 3090 GPUs with 24GB VRAM, taking 24-48 hours to convergence. The dynamics model is 15M trainable paramters, which takes up 4GB of GPU memory during training and inference. All experiments were conducted on single RTX 3090 GPUs taking 10-20 minutes per seed per task.
Software Dependencies No The diffusion policy is trained to take a 2-step history stack of visual observations and robot proprioception and predict a chunk of 16 actions. It uses a Resnet-18 image encoder that conditions a U-Net with 4 encoding and 4 decoding layers. [...] We use the Dino V2 patch embeddings as the latent space [...] For the latent dynamics model, we use a 6-layer Transformer Encoder with 8 heads.
Experiment Setup Yes We use the Adam optimizer with a learning rate of 1e-4. We train the model for 200k gradient steps with a batch size of 16. All parts of the diffusion policy are trained together using expert data. During execution, 14 actions are executed in the environment open-loop before the policy is queried again. [...] We use the Adam optimizer at a learning rate of 1e-4. We train the model for 600k gradient steps with batch size 16, a point past model convergence. [...] We use stochastic sampling M = 4 for our experiments as a balance of stability and computation efficiency. We use 20 guidance conditions per task...