Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Generative Trajectory Stitching through Diffusion Composition

Authors: Yunhao Luo, Utkarsh Mishra, Yilun Du, Danfei Xu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments across benchmark tasks of varying difficulty levels, including different environment sizes (from simple U-mazes to complex giant mazes), agent state dimensions (from 2D point agents to 50D humanoid robots), trajectory types (from maze navigation trajectories to ball dribbling trajectories), and training data quality (from clean demonstrations to noisy exploration data). Our results demonstrate that Comp Diffuser significantly outperforms multiple imitation learning and offline reinforcement learning baselines across all settings.
Researcher Affiliation	Academia	Yunhao Luo1 Utkarsh A. Mishra1 Yilun Du2, Danfei Xu1, 1 Georgia Tech 2 Harvard University
Pseudocode	Yes	Algorithm 1 Training Comp Diffuser Algorithm 2 Autoregressive Trajectory Sampling
Open Source Code	No	Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: We will release our code implementation and corresponding instructions upon acceptance.
Open Datasets	Yes	In this paper, we directly evaluate our method on public stitching datasets introduced in two recent papers Ghugare et al. [21] and OGBench [51]. ... We use the environments and datasets from their official implementation release at https://github.com/Raj Ghugare19/stitching-is-combinatorial-generalisation. ... All methods are trained on the OGBench public release datasets.
Dataset Splits	No	The paper describes how training data for Ghugare et al. [21] datasets are curated by dividing each environment into small regions with overlap, and how OGBench datasets are constrained by travel distance. It also describes evaluation tasks: "in Ghugare et al. [21] datasets, we evaluate on 2 tasks in U-Maze, 6 tasks in Medium, and 7 tasks in Large with 10 episodes per task; in OGBench [51], we evaluate on 5 tasks in each environment with 20 episodes per task." However, it does not explicitly state specific training/validation/test splits (e.g., percentages or counts) for the dataset used to train the models.
Hardware Specification	Yes	Hardware: We use 1 NVIDIA GPU for each experiment. A GPU with 24GB memory is sufficient to train our models and it takes 1-2 days to train a model using a recent mid-level NVIDIA GPU. ... In Table 10, ... using one Nvidia L40S GPU (unit: second).
Software Dependencies	Yes	Software: The computation platform is installed with Ubuntu 20.04.6, Python 3.9.20, Py Torch 2.5.0.
Experiment Setup	Yes	Table 12: Hyperparameters for Training on Point Maze Giant Stitch environment. ... Horizon 160 Diffusion Time Step 512 Probability of Condition Dropout 0.2 Iterations 1.2M Batch Size 128 Optimizer Adam Learning Rate 2e-4 U-Net Base Dim 128 U-Net Encoder Dims (128, 256, 512, 1024)