Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Compositional Monte Carlo Tree Diffusion for Extendable Planning

Authors: Jaesik Yoon, Hyeonseo Cho, Sungjin Ahn

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that the proposed C-MCTD methods significantly outperform standard replanning strategies based on MCTD across a variety of task settings. Notably, Preplan Composer achieves perfect success on the challenging pointmaze-giant task, which requires generating plans approximately 10 longer than the trajectories seen during training. We conduct a comprehensive evaluation on tasks from the Offline Goal-conditioned RL benchmark (OGBench) [28], following MCTD s experimental setup [32]. Our evaluation spans point and ant maze navigation with extended horizons, multi-cube robot arm manipulation, and partially observable visual maze tasks. We report mean success rate (%) and planning time (seconds), averaged over 50 runs (5 tasks 10 seeds), with complete configuration details in Appendix A.
Researcher Affiliation Collaboration Jaesik Yoon KAIST & SAP EMAIL Hyeonseo Cho KAIST EMAIL Sungjin Ahn KAIST & NYU EMAIL
Pseudocode Yes The core instantiation of this framework, referred to as Online Composer, extends the tree search of MCTD to the plan composition through three key components. First, stitching-based tree extension connects individual diffusion-generated plans into a longer, coherent plan, enabling global reasoning beyond the limitations of isolated plan generation. Second, guidance sets as meta-actions provide configurable control parameters for the plan generation process. This mechanism enables the planner to generate targeted and adaptive high-quality plans, balancing exploration and exploitation according to its given guidance set. Third, fast replanning for simulation quickly approximates remaining trajectory segments using accelerated denoising methods, significantly reducing computational costs while preserving trajectory coherence during inference. While Online Composer demonstrates strong performance across diverse environments, its sequential search procedure becomes inefficient in large state spaces due to the exponential growth in the number of candidate plan combinations. To address this challenge, we introduce two specialized variants: Distributed Composer and Preplan Composer. Distributed Composer leverages parallel processing and plan sharing across multiple search trees to mitigate the combinatorial explosion of the search space. Preplan Composer, in contrast, preconstructs a plan graph offline, enabling more efficient inference-time planning by reducing online search overhead and improving overall performance. ... Algorithm 1 Online Composer ... Algorithm 2 Distributed Composer ... Algorithm 3 Preplan Composer ... Algorithm 4 Online Composer ... Algorithm 5 Distributed Composer (DC) ... Algorithm 6 Preplan Composer (PC) Building Plan Graph ... Algorithm 7 Preplan Composer Inference
Open Source Code No Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: The source code will be prepared to be as a release version after the reviewing process.
Open Datasets Yes To demonstrate the effectiveness of C-MCTD in compositional long-horizon planning, we conduct a comprehensive evaluation on tasks from the Offline Goal-conditioned RL benchmark (OGBench) [28], following MCTD s experimental setup [32]. Our evaluation spans point and ant maze navigation with extended horizons, multi-cube robot arm manipulation, and partially observable visual maze tasks. ... [28] Seohong Park, Kevin Frans, Benjamin Eysenbach, and Sergey Levine. Ogbench: Benchmarking offline goal-conditioned rl. In International Conference on Learning Representations (ICLR), 2025.
Dataset Splits No The paper uses the "Offline Goal-conditioned RL benchmark (OGBench) [28]" and "stitch datasets from Park et al. [28]" and "play datasets". It describes training trajectory lengths (e.g., 200 steps) and sampling sub-sequences or sub-trajectories for training, but it does not specify explicit training/validation/test dataset splits (e.g., percentages, counts, or references to standard splits) for the overall datasets used.
Hardware Specification Yes All experiments were conducted on high-performance hardware consisting of 8 NVIDIA RTX 4090 GPUs, 512GB system memory, and a 96-thread CPU.
Software Dependencies No The paper does not provide specific version numbers for software libraries, frameworks, or programming languages used (e.g., Python version, PyTorch version, CUDA version).
Experiment Setup Yes Our hyperparameter configuration is largely adopted from Yoon et al. [32]. For full reproducibility, we provide the detailed configurations for the baseline models in Tables 5 8. Hyperparameters introduced by our proposed C-MCTD are detailed in Section A.5 alongside their corresponding experimental setups. For the SSD baseline, we utilized the default configurations from its official public implementation. ... Table 5: Hyperparameters for the Diffuser baseline. Table 6: Hyperparameters for the Diffusion Forcing baseline. Table 7: Hyperparameters for the Monte Carlo Tree Diffusion (MCTD) baseline. Table 8: Hyperparameters for the value-learning policy baseline. Section A.5 Evaluation details (A.5.1 Long-horizon maze environments, A.5.2 Robot arm manipulation environment, A.5.3 Visual maze environment)