Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Tree-Guided Diffusion Planner

Authors: Hyeonseong Jeon, Cheolhong Min, Jaesik Park

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate TDP on three diverse tasks: maze gold-picking, robot arm block manipulation, and Ant Maze multi-goal exploration. TDP consistently outperforms state-of-the-art approaches on all tasks. The project page can be found at: tree-diffusion-planner.github.io.
Researcher Affiliation	Academia	Hyeonseong Jeon1 Cheolhong Min1 Jaesik Park1,2 1Department of Computer Science & Engineering, 2Interdisciplinary Program of AI Seoul National University
Pseudocode	Yes	Appendix A, B outline the overall TDP pipeline and present the full algorithms. We detail the core modules of TDP: state decomposition (Sec. 4.1), parent branching (Sec. 4.2), and subtree expansion (Sec. 4.3). Algorithm 1 State Decomposition (SD) ... Algorithm 2 Parent Branching ... Algorithm 3 Sub-Tree Expansion
Open Source Code	Yes	The project page can be found at: tree-diffusion-planner.github.io. Code with instructions to reproduce the main results is available on the project website.
Open Datasets	Yes	We extend the single gold-picking example [2] in the Maze2D environment [40] to a multi-task benchmark. Diffusion planners are pretrained on arbitrary block stacking demonstrations collected from PDDLStream [48]. We finally evaluate test-time multi-goal exploration capability on Ant Maze [40].
Dataset Splits	No	The paper does not provide explicit training/test/validation dataset splits. Instead, it describes evaluation on tasks and benchmarks, specifying metrics, seeds, and task configurations, but not conventional data splits.
Hardware Specification	Yes	All experiments were conducted using a single NVIDIA Ge Force RTX 3090 GPU.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., Python X.X, PyTorch X.X) in the main text or appendices.
Experiment Setup	Yes	All experimental hyperparameters are reported in Appendix D. Table 4: Hyperparameters of three tasks. Task Name Value maze2d-medium planning horizon Tpred 256 maze2d-medium maximum steps Tmax 600 maze2d-large planning horizon Tpred 384 maze2d-large maximum steps Tmax 800 Maze2D Gold-picking Threshold distance 0.3 gradient guidance strength αg 62.5 particle guidance strength αp 0.1 diffusion steps N = Nf 256 Number of samples B 128