Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Efficient Robotic Policy Learning via Latent Space Backward Planning
Authors: Dongxiu Liu, Haoyi Niu, Zhihao Wang, Jinliang Zheng, Yinan Zheng, Zhonghong Ou, Jianming Hu, Jianxiong Li, Xianyuan Zhan
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive simulation and real-robot long-horizon experiments, we show that LBP outperforms existing fine-grained and forward planning methods, achieving SOTA performance. |
| Researcher Affiliation | Academia | 1Tsinghua University 2Beijing University of Posts and Telecommunications 3Peking University 4Shanghai AI Lab. Correspondence to: Zhonghong Ou <EMAIL>, Jianming Hu <EMAIL>, Xianyuan Zhan <EMAIL>. |
| Pseudocode | No | The paper describes the practical algorithm in section 4.4 in prose and with equations, but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | Project Page: https://lbp-authors.github.io. This is a project page, which does not explicitly state code release or link directly to a code repository as required by the guidelines. |
| Open Datasets | Yes | Using previously validated evaluation recipes for embodied AI (Black et al., 2024; Kim et al., 2024; Tian et al., 2025; Zheng et al., 2025), we assess the performance of the proposed LBP. Specifically, we assess LBP on both the LIBERO-LONG simulation benchmark and a real-robot environment with long-horizon, multi-stage tasks. LIBERO-LONG (Liu et al., 2024) consists of 10 distinct long-horizon robotic manipulation tasks |
| Dataset Splits | No | All models are trained on 50 unique expert demonstrations for each task. All models are trained using 200 expert demonstrations for the task Move cups and Shift cups, and a total of 200 expert demonstrations for Stack 3 cups and Stack 4 cups. For real-world experiments, we evaluate the last three checkpoints, with each checkpoint being tested across 10 rollouts per task to provide an average score at each stage. |
| Hardware Specification | Yes | The high-level image-editing diffusion model is trained on video data using four A6000 GPUs. All experimental evaluations are conducted with a 6 Do F AIRBOT robotic arm, together with three different views provided by Logitech C922PRO cameras. |
| Software Dependencies | No | The paper mentions software components such as Res Net-34, Fi LM conditioning layers, diffusion loss, Adam W optimizer, Decision NCE, Sig LIP, and CLIP, but does not provide specific version numbers for these tools or libraries. |
| Experiment Setup | Yes | Unless otherwise stated, we adopt a three-step planning scheme (predicting a final goal and two intermediate subgoals) of LBP and set the planning coefficient λ = 0.5. The policy is optimized with diffusion loss to model complex distributions (Chi et al., 2023), with the denoising step fixed at 25. For training the high-level planner, we use a batch size of 64 and train for 100k steps with the Adam W optimizer. For the low-level policy on LIBERO-LONG, we set the batch size to 64 and train for 200k steps. In the case of the low-level policy for real-world robot experiments, we increase the batch size to 128 and train for 400k steps. |