Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Efficient Robotic Policy Learning via Latent Space Backward Planning

Authors: Dongxiu Liu, Haoyi Niu, Zhihao Wang, Jinliang Zheng, Yinan Zheng, Zhonghong Ou, Jianming Hu, Jianxiong Li, Xianyuan Zhan

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive simulation and real-robot long-horizon experiments, we show that LBP outperforms existing fine-grained and forward planning methods, achieving SOTA performance.
Researcher Affiliation Academia 1Tsinghua University 2Beijing University of Posts and Telecommunications 3Peking University 4Shanghai AI Lab. Correspondence to: Zhonghong Ou <EMAIL>, Jianming Hu <EMAIL>, Xianyuan Zhan <EMAIL>.
Pseudocode No The paper describes the practical algorithm in section 4.4 in prose and with equations, but does not include structured pseudocode or algorithm blocks.
Open Source Code No Project Page: https://lbp-authors.github.io. This is a project page, which does not explicitly state code release or link directly to a code repository as required by the guidelines.
Open Datasets Yes Using previously validated evaluation recipes for embodied AI (Black et al., 2024; Kim et al., 2024; Tian et al., 2025; Zheng et al., 2025), we assess the performance of the proposed LBP. Specifically, we assess LBP on both the LIBERO-LONG simulation benchmark and a real-robot environment with long-horizon, multi-stage tasks. LIBERO-LONG (Liu et al., 2024) consists of 10 distinct long-horizon robotic manipulation tasks
Dataset Splits No All models are trained on 50 unique expert demonstrations for each task. All models are trained using 200 expert demonstrations for the task Move cups and Shift cups, and a total of 200 expert demonstrations for Stack 3 cups and Stack 4 cups. For real-world experiments, we evaluate the last three checkpoints, with each checkpoint being tested across 10 rollouts per task to provide an average score at each stage.
Hardware Specification Yes The high-level image-editing diffusion model is trained on video data using four A6000 GPUs. All experimental evaluations are conducted with a 6 Do F AIRBOT robotic arm, together with three different views provided by Logitech C922PRO cameras.
Software Dependencies No The paper mentions software components such as Res Net-34, Fi LM conditioning layers, diffusion loss, Adam W optimizer, Decision NCE, Sig LIP, and CLIP, but does not provide specific version numbers for these tools or libraries.
Experiment Setup Yes Unless otherwise stated, we adopt a three-step planning scheme (predicting a final goal and two intermediate subgoals) of LBP and set the planning coefficient λ = 0.5. The policy is optimized with diffusion loss to model complex distributions (Chi et al., 2023), with the denoising step fixed at 25. For training the high-level planner, we use a batch size of 64 and train for 100k steps with the Adam W optimizer. For the low-level policy on LIBERO-LONG, we set the batch size to 64 and train for 200k steps. In the case of the low-level policy for real-world robot experiments, we increase the batch size to 128 and train for 400k steps.