Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Prior-Guided Diffusion Planning for Offline Reinforcement Learning
Authors: Donghyeon Ki, JunHyeok Oh, Seong-Woong Shim, Byung-Jun Lee
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our code is available at https://github.com/ku-dmlab/PG. empirically demonstrate that PG outperforms state-of-the-art diffusion policies and planners across diverse long-horizon offline RL benchmarks. Empirically, PG outperforms existing diffusion-based methods and achieves state-of-the-art performance on long-horizon tasks in the D4RL offline RL benchmark suite [24]. |
| Researcher Affiliation | Collaboration | Donghyeon Ki1 Jun Hyeok Oh1 Seong-Woong Shim1 Byung-Jun Lee1,2 1Korea University 2Gauss Labs Inc. |
| Pseudocode | Yes | Algorithm 1 Prior Guidance Input: Dataset D, Hyperparameter α Require: Planner gs, Inverse Dynamics ϵω, Critic V Initialize: Prior pψ, Critic Vϕ |
| Open Source Code | Yes | Our code is available at https://github.com/ku-dmlab/PG. |
| Open Datasets | Yes | Empirically, PG outperforms existing diffusion-based methods and achieves state-of-the-art performance on long-horizon tasks in the D4RL offline RL benchmark suite [24]. |
| Dataset Splits | Yes | We conducted experiments on the D4RL offline RL benchmark [24], which span a wide range of domains and dataset settings. For Mu Jo Co tasks, we report the average performance over 10 evaluation trajectories for each of 5 independently trained models. For Kitchen, Ant Maze, and Maze2D tasks, we average over 100 evaluation trajectories for each of 5 independently trained models. |
| Hardware Specification | Yes | We conducted all experiments using four NVIDIA RTX 4090 GPUs. |
| Software Dependencies | No | The paper mentions several algorithms, models, and optimizers by name and citation (e.g., GRU [30], Transformer [48], DDIM [25], DDPM [5], Adam W [49], Adam [50]), but it does not specify any particular software libraries with version numbers (like PyTorch 1.9 or Python 3.8) that would be needed for replication. |
| Experiment Setup | Yes | This section outlines the hyperparameters used to train Prior Guidance. The planner gs, inverse dynamics model ϵω, and critic V all follow the same settings as those used in Diffusion Veteran. Detailed hyperparameter settings are provided in Table 5. |