Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Latent Policy Barrier: Learning Robust Visuomotor Policies by Staying In-Distribution

Authors: Zhanyi Sun, Shuran Song

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Both simulated and real-world experiments show that LPB improves both policy robustness and data efficiency, enabling reliable manipulation from limited expert data and without additional human correction or annotation.
Researcher Affiliation	Academia	Zhanyi Sun Shuran Song Stanford University project-latentpolicybarrier.github.io
Pseudocode	Yes	Algorithm 1 Latent Policy Barrier (Inference time)
Open Source Code	Yes	Code and data for reproducing the result will be made publicly available.
Open Datasets	Yes	For simulation experiments, the demonstration data is taken from public datasets ([41, 12]).
Dataset Splits	Yes	For each Robomimic task (Square, Tool-Hang, Transport) and for Push-T, we keep 20% of the original expert demonstrations. For each task, a base diffusion policy is trained on these demonstrations. For Libero10, we use all 50 provided demonstrations for each of the ten tasks to train a language-conditioned, multi-task base diffusion policy.
Hardware Specification	Yes	All simulated experiments are run on a single NVIDIA L40S GPU (46 GB VRAM). ... The dynamics model is trained in parallel on six NVIDIA L40S GPUs and converges in approximately 36 h.
Software Dependencies	No	The paper mentions software components like "Diffusion Policy", "Res Net-18", "U-Net", "Vision Transformer (Vi T)", and low-level controllers from GitHub, but does not provide specific version numbers for these software dependencies as required.
Experiment Setup	Yes	Task-specific and shared hyperparameters are provided in Table 5 and Table 6, respectively. ... Training hyperparameters for fϕ are provided in Table 8. ... The OOD threshold τ is chosen empirically by rolling out the final policy checkpoint, while the guidance scale η is selected via a grid search. Both η and τ for each task are listed in Table 9.