Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Real-Time Execution of Action Chunking Flow Policies

Authors: Kevin Black, Manuel Galliker, Sergey Levine

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Results demonstrate that RTC is fast, performant, and uniquely robust to inference delay, significantly improving task throughput and enabling high success rates in precise tasks such as lighting a match even in the presence of significant latency. To test RTC, we introduce a new benchmark of 12 highly dynamic tasks in the Kinetix simulator, as well as evaluate 6 challenging real-world bimanual manipulation tasks.
Researcher Affiliation Industry Kevin Black1,2, Manuel Y. Galliker1 Sergey Levine1,2 1Physical Intelligence 2UC Berkeley EMAIL
Pseudocode Yes Algorithm 1 Real-Time Chunking
Open Source Code Yes The code for the simulated experiments is available at https://github.com/ Physical-Intelligence/real-time-chunking-kinetix.
Open Datasets No To generate data for imitation learning, we first train expert policies using RPO [50] and a binary success reward. For each environment, we train 6 expert policies with different seeds and then generate a 1M transition dataset with a different policy selected each episode. The paper does not provide concrete access information for this generated dataset. Additionally, "The data used to train π0.5, as well as the real robot runtime code, are not released as these are proprietary."
Dataset Splits No To generate data for imitation learning, we first train expert policies using RPO [50] and a binary success reward. For each environment, we train 6 expert policies with different seeds and then generate a 1M transition dataset with a different policy selected each episode. We then train action chunking flow policies with a prediction horizon of H = 8 and a 4-layer MLP-Mixer [61] architecture for 32 epochs. The paper describes how data was generated and used for training, but does not specify train/test/validation splits for this dataset.
Hardware Specification Yes Inference runs on an NVIDIA RTX 4090 GPU using bfloat16 precision and n = 5 denoising steps. All the experiments in this work use no more than 8 NVIDIA H100 GPUs (one NVIDIA DGX server) at a time. H100s are used via a cloud provider. In the mobile manipulation case, this computer is an Intel NUC portable computer with a 12th Gen Intel i7-1260P processor. In the non-mobile case, this computer is a desktop workstation with an AMD Ryzen 9 7950X processor.
Software Dependencies No We first train expert policies using RPO [50] and a binary success reward. ... We then train action chunking flow policies with a prediction horizon of H = 8 and a 4-layer MLP-Mixer [61] architecture for 32 epochs. ... Model inference uses bfloat16 precision and n = 5 denoising steps. The paper mentions software tools and precision, but does not provide specific version numbers for software dependencies like programming languages or libraries.
Experiment Setup Yes We then train action chunking flow policies with a prediction horizon of H = 8 and a 4-layer MLP-Mixer [61] architecture for 32 epochs. ... We use π0.5 (H = 50, t = 20ms) with n = 5 denoising steps... Table 4: Hyperparameters used for RTC (Algorithm 1). n Denoising steps 5 5, H Prediction horizon 8 50, smin Minimum execution horizon 25, β Guidance weight clipping 5 5, b Delay buffer size 10.