Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Efficient Robotic Policy Learning via Latent Space Backward Planning
Authors: Dongxiu Liu, Haoyi Niu, Zhihao Wang, Jinliang Zheng, Yinan Zheng, Zhonghong Ou, Jianming Hu, Jianxiong Li, Xianyuan Zhan
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive simulation and real-robot long-horizon experiments, we show that LBP outperforms existing fine-grained and forward planning methods, achieving SOTA performance. |
| Researcher Affiliation | Academia | 1Tsinghua University 2Beijing University of Posts and Telecommunications 3Peking University 4Shanghai AI Lab. Correspondence to: Zhonghong Ou <EMAIL>, Jianming Hu <EMAIL>, Xianyuan Zhan <EMAIL>. |
| Pseudocode | No | The paper describes the practical algorithm in section 4.4 in prose and with equations, but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | Project Page: https://lbp-authors.github.io. This is a project page, which does not explicitly state code release or link directly to a code repository as required by the guidelines. |
| Open Datasets | Yes | Using previously validated evaluation recipes for embodied AI (Black et al., 2024; Kim et al., 2024; Tian et al., 2025; Zheng et al., 2025), we assess the performance of the proposed LBP. Specifically, we assess LBP on both the LIBERO-LONG simulation benchmark and a real-robot environment with long-horizon, multi-stage tasks. LIBERO-LONG (Liu et al., 2024) consists of 10 distinct long-horizon robotic manipulation tasks |
| Dataset Splits | No | All models are trained on 50 unique expert demonstrations for each task. All models are trained using 200 expert demonstrations for the task Move cups and Shift cups, and a total of 200 expert demonstrations for Stack 3 cups and Stack 4 cups. For real-world experiments, we evaluate the last three checkpoints, with each checkpoint being tested across 10 rollouts per task to provide an average score at each stage. |
| Hardware Specification | Yes | The high-level image-editing diffusion model is trained on video data using four A6000 GPUs. All experimental evaluations are conducted with a 6 Do F AIRBOT robotic arm, together with three different views provided by Logitech C922PRO cameras. |
| Software Dependencies | No | The paper mentions software components such as Res Net-34, Fi LM conditioning layers, diffusion loss, Adam W optimizer, Decision NCE, Sig LIP, and CLIP, but does not provide specific version numbers for these tools or libraries. |
| Experiment Setup | Yes | Unless otherwise stated, we adopt a three-step planning scheme (predicting a final goal and two intermediate subgoals) of LBP and set the planning coefficient λ = 0.5. The policy is optimized with diffusion loss to model complex distributions (Chi et al., 2023), with the denoising step fixed at 25. For training the high-level planner, we use a batch size of 64 and train for 100k steps with the Adam W optimizer. For the low-level policy on LIBERO-LONG, we set the batch size to 64 and train for 200k steps. In the case of the low-level policy for real-world robot experiments, we increase the batch size to 128 and train for 400k steps. |