Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

State-Covering Trajectory Stitching for Diffusion Planners

Authors: Kyowoon Lee, Jaesik Choi

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments across diverse and challenging benchmark tasks show that SCo TS significantly enhances the stitching capabilities and long-horizon generalization of diffusion planners.
Researcher Affiliation	Academia	Kyowoon Lee KAIST EMAIL Jaesik Choi KAIST, INEEJI EMAIL
Pseudocode	Yes	Algorithm 1 Overview of the SCo TS Framework
Open Source Code	Yes	Our code is available at https://github.com/leekwoon/scots/
Open Datasets	Yes	We evaluate SCo TS on OGBench benchmark (Park et al., 2025)
Dataset Splits	Yes	We evaluate SCo TS on OGBench benchmark (Park et al., 2025), spanning diverse difficulties, environment sizes, agent state dimensions, and training data qualities. Specifically, the benchmark includes three locomotion environments: Point Maze (controlling a 2D point mass) and Ant Maze (controlling an 8-Do F quadrupedal Ant). We consider two distinct dataset types, each designed to evaluate specific challenges. The Stitch dataset comprises short, goal-reaching trajectories limited to four cell units, thus requiring the agent to stitch multiple segments (up to 8) for successful inference. In contrast, the Explore dataset assesses learning navigation behaviors from extensive yet low-quality exploratory trajectories, collected by frequently resampling random directions and injecting significant action noise. For each environment, we report the success rate averaged over all evaluation episodes, where an episode is considered successful if the agent reaches sufficiently close to the goal state within a predefined distance threshold.
Hardware Specification	Yes	All experiments were conducted using a single NVIDIA A10 GPU.
Software Dependencies	No	We utilize Di T1D (Peebles & Xie, 2023) as the neural network backbone for both the diffusion planner and the stitcher, due to its large receptive field and effectiveness in modeling trajectory-level dependencies. Following prior studies (Dong et al., 2023; Lu et al., 2025), we employ a Di T1D architecture with a hidden dimension of 256, a head dimension of 32, and a total of 8 Di T blocks consistently across all environments. To achieve this, we employ an Inverted File (IVF) index from the Faiss library (Douze et al., 2024), which is specifically designed for large-scale similarity searches.
Experiment Setup	Yes	A full list of the hyperparameters is reported in Table 5. Table 5: Hyperparameters for SCo TS. Component Hyperparameter Value Tuning Choices SCo TS: Temporal Distance-Preserving Embedding (ϕ) Learning Rate 3 10 4 Latent Dimension 32 Batch Size 1024 Training Steps 1,000,000 Network Backbone MLP MLP Dimensions (512, 512, 512) Expectile (ξ for ℓ2 ξ) 0.95 - SCo TS: Inverse Dynamics Model (for actions in Daug) Network Backbone MLP MLP Dimensions (256, 256, 256) Training Steps 200,000 - SCo TS: Stitching Process Parameters Top-k Candidates (Search) 10 kdensity (Novelty Score) 30 Novelty Weight (β) 2.0 Augmented Dataset Size 5M transitions Nstitch (Stitches per Traj.) Task-dependent (e.g., 40) Ntraj (Generated Traj.) Task-dependent (e.g., 5,000) - SCo TS: Diffusion-based Stitcher (pstitcher θ ) Network Backbone Di T1D Learning Rate 2 10 4 Weight Decay 1 10 5 Batch Size 64 Training Steps 1,000,000 Solver DDIM Sampling Steps (DDIM) 20 Horizon (Hstitcher) 26 - Hierarchical Diffusion Planner (HD) Network Backbone Di T1D Learning Rate 2 10 4 Weight Decay 1 10 5 Batch Size 64 Training Steps 1,000,000 Solver DDIM Sampling Steps (DDIM) 20 Plan Horizon (on original data) 101 (Stitch), 401 (Explore) Plan Horizon (on Daug) 501 (M/L), 1001 (G/Explore) Temporal Jump 26 - Execution Parameters Low-level Controller Horizon Tuned {5, 10, 15, 20, 25} Replanning Interval Tuned {50, 100, 200}