Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Latent Policy Barrier: Learning Robust Visuomotor Policies by Staying In-Distribution
Authors: Zhanyi Sun, Shuran Song
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Both simulated and real-world experiments show that LPB improves both policy robustness and data efficiency, enabling reliable manipulation from limited expert data and without additional human correction or annotation. |
| Researcher Affiliation | Academia | Zhanyi Sun Shuran Song Stanford University project-latentpolicybarrier.github.io |
| Pseudocode | Yes | Algorithm 1 Latent Policy Barrier (Inference time) |
| Open Source Code | Yes | Code and data for reproducing the result will be made publicly available. |
| Open Datasets | Yes | For simulation experiments, the demonstration data is taken from public datasets ([41, 12]). |
| Dataset Splits | Yes | For each Robomimic task (Square, Tool-Hang, Transport) and for Push-T, we keep 20% of the original expert demonstrations. For each task, a base diffusion policy is trained on these demonstrations. For Libero10, we use all 50 provided demonstrations for each of the ten tasks to train a language-conditioned, multi-task base diffusion policy. |
| Hardware Specification | Yes | All simulated experiments are run on a single NVIDIA L40S GPU (46 GB VRAM). ... The dynamics model is trained in parallel on six NVIDIA L40S GPUs and converges in approximately 36 h. |
| Software Dependencies | No | The paper mentions software components like "Diffusion Policy", "Res Net-18", "U-Net", "Vision Transformer (Vi T)", and low-level controllers from GitHub, but does not provide specific version numbers for these software dependencies as required. |
| Experiment Setup | Yes | Task-specific and shared hyperparameters are provided in Table 5 and Table 6, respectively. ... Training hyperparameters for fϕ are provided in Table 8. ... The OOD threshold τ is chosen empirically by rolling out the final policy checkpoint, while the guidance scale η is selected via a grid search. Both η and τ for each task are listed in Table 9. |