Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Dynamic Test-Time Compute Scaling in Control Policy: Difficulty-Aware Stochastic Interpolant Policy

Authors: Inkook Chun, Seungjae Lee, Michael Albergo, Saining Xie, Eric Vanden-Eijnden

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through comprehensive benchmarks across diverse manipulation tasks, DA-SIP achieves 2.6 4.4 reduction in total computation time while maintaining task success rates comparable to fixed maximum-computation baselines.
Researcher Affiliation	Collaboration	New York University University of Maryland Harvard University Capital Fund Management
Pseudocode	No	The paper describes methods and architecture details, but does not include structured pseudocode or algorithm blocks. For instance, Table 7 describes the Lightweight CNN architecture and training parameters, but not as pseudocode.
Open Source Code	No	We thank the authors of Stochastic Interpolant and the Si T for their paper and codebase, which served as the foundation of our research [9, 11]. We are also grateful for the authors of the Diffusion Policy codebase [16], which was instrumental for our implementation.
Open Datasets	Yes	Our evaluation spans diverse simulation environments: Robo Mimic: Benchmark suite for imitation learning in manipulation (Can, Lift, Square, Tool Hang tasks) [26] Block Push: A non-prehensile manipulation task (from the Fetch suite) in which a 7-Do F arm must push a cubic block across a tabletop to a target pose, requiring precise contact planning and continuous control [27] Push-T: Precision manipulation task requiring continuous control [28] Kitchen: Multi-stage environment with sequential task completion [29] Multimodal Ant: Locomotion tasks requiring complex coordination [30]
Dataset Splits	Yes	This supervised approach is trained on 300 annotated images per task (from a total dataset of 2000 timesteps per task), providing efficient real-time classification with minimal computational overhead ( 20ms per inference). ... Test set: 20% of data (stratified split)
Hardware Specification	Yes	Training is performed on NVIDIA L40S GPUs.
Software Dependencies	No	The paper mentions software components like 'Optimizer Adam W' and 'Loss function MSE', and specific VLM models like 'Qwen2.5-VL-7B-Instruct', but does not provide version numbers for general software dependencies such as programming languages or deep learning frameworks (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup	Yes	Table 6: Stochastic interpolant policy hyperparameters Parameter Value Optimizer Adam W Weight decay 1e-6 Learning rate 1e-4 Schedule Cosine decay Batch size 256 Gradient clipping 1.0 Training epochs 5,000 Checkpointing Every 50 epochs Loss function MSE EMA rate 0.9999 Prediction targets Velocity/Score/Noise LR Scheduler Cosine decay Interpolants Linear/VP/GVP warmup steps 500