Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Prior-Guided Diffusion Planning for Offline Reinforcement Learning

Authors: Donghyeon Ki, JunHyeok Oh, Seong-Woong Shim, Byung-Jun Lee

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our code is available at https://github.com/ku-dmlab/PG. empirically demonstrate that PG outperforms state-of-the-art diffusion policies and planners across diverse long-horizon offline RL benchmarks. Empirically, PG outperforms existing diffusion-based methods and achieves state-of-the-art performance on long-horizon tasks in the D4RL offline RL benchmark suite [24].
Researcher Affiliation	Collaboration	Donghyeon Ki1 Jun Hyeok Oh1 Seong-Woong Shim1 Byung-Jun Lee1,2 1Korea University 2Gauss Labs Inc.
Pseudocode	Yes	Algorithm 1 Prior Guidance Input: Dataset D, Hyperparameter α Require: Planner gs, Inverse Dynamics ϵω, Critic V Initialize: Prior pψ, Critic Vϕ
Open Source Code	Yes	Our code is available at https://github.com/ku-dmlab/PG.
Open Datasets	Yes	Empirically, PG outperforms existing diffusion-based methods and achieves state-of-the-art performance on long-horizon tasks in the D4RL offline RL benchmark suite [24].
Dataset Splits	Yes	We conducted experiments on the D4RL offline RL benchmark [24], which span a wide range of domains and dataset settings. For Mu Jo Co tasks, we report the average performance over 10 evaluation trajectories for each of 5 independently trained models. For Kitchen, Ant Maze, and Maze2D tasks, we average over 100 evaluation trajectories for each of 5 independently trained models.
Hardware Specification	Yes	We conducted all experiments using four NVIDIA RTX 4090 GPUs.
Software Dependencies	No	The paper mentions several algorithms, models, and optimizers by name and citation (e.g., GRU [30], Transformer [48], DDIM [25], DDPM [5], Adam W [49], Adam [50]), but it does not specify any particular software libraries with version numbers (like PyTorch 1.9 or Python 3.8) that would be needed for replication.
Experiment Setup	Yes	This section outlines the hyperparameters used to train Prior Guidance. The planner gs, inverse dynamics model ϵω, and critic V all follow the same settings as those used in Diffusion Veteran. Detailed hyperparameter settings are provided in Table 5.