Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

FreqPolicy: Efficient Flow-based Visuomotor Policy via Frequency Consistency

Authors: Yifei Su, Ning Liu, Dong Chen, Zhen Zhao, Kun Wu, Meng Li, Zhiyuan Xu, Zhengping Che, Jian Tang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We assess Freq Policy on 53 tasks across 3 simulation benchmarks, proving its superiority over existing one-step action generators. We further integrate Freq Policy into the vision-language-action (VLA) model and achieve acceleration without performance degradation on 40 tasks of Libero. Besides, we show efficiency and effectiveness in real-world robotic scenarios with an inference frequency of 93.5 Hz. (...) We conduct extensive experiments in both simulation and the real world to evaluate Freq Policy, demonstrating its superiority over existing one-step action generators, e.g., achieving 78.5% in Meta World.
Researcher Affiliation	Academia	1Beijing Innovation Center of Humanoid Robotics 2NLPR, MAIS, Institute of Automation of Chinese Academy of Sciences
Pseudocode	No	The paper describes the methodology using mathematical equations and descriptive text, but it does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: All simulation data used in our experiments are sourced from publicly available benchmarks. (...) The real-world experiments are based on the open-source Le Robot framework 5. Le Robot is designed to support real-world robotics research by providing models, datasets, and tools in PyTorch. It includes state-of-the-art methods, such as Diffusion Policy, which we use as a baseline, that have demonstrated strong transfer capabilities to real-world settings, with a focus on imitation learning. We convert our collected real-world data into the Le Robot-supported format and integrate both the Flow Matching Policy and our proposed Freq Policy into the framework. This enables training directly on real data, facilitating practical deployment and inference.
Open Datasets	Yes	We assess Freq Policy on 53 tasks across 3 simulation benchmarks, proving its superiority over existing one-step action generators. (...) We evaluate the Freq Policy on 5 tasks from the widely-used Robomimic [47] benchmark: Lift, Can, Square, Transport, and Tool Hang. (...) We conduct experiments on 53 tasks across two benchmarks: Adroit [55] and Meta World [74]. (...) LIBERO simulation benchmark [37].
Dataset Splits	No	For each task, we use proficient human demonstrations datasets with image-based observations, containing 200 demonstrations per task. (...) Training is conducted using 10 expert demonstrations per task. (...) Each suite provides 500 expert demonstrations across 10 tasks (...). For each evaluation, we perform 20 trials with various initializations on the physical robot and report the mean success rate.
Hardware Specification	Yes	Training is conducted for 1000 epochs on a single NVIDIA A100 GPU. (...) All experiments are conducted on a single NVIDIA A100 GPU. (...) We train for 150K gradient steps for all models using a batch size of 16 across 4 A100 GPUs (...). We also measure and report the real-time inference frequency of each policy using an NVIDIA RTX 4090.
Software Dependencies	No	The real-world experiments are based on the open-source Le Robot framework 5. Le Robot is designed to support real-world robotics research by providing models, datasets, and tools in Py Torch.
Experiment Setup	Yes	All models are trained using a batch size of 128 with the Adam W optimizer and a learning rate of 1.0e-4. Training is conducted for 1000 epochs on a single NVIDIA A100 GPU. For Consistency Policy, the student model is trained for 450 epochs. (...) To train the Freq Policy model, we use a batch size of 128 and the Adam W optimizer with a learning rate of 1.0e-4, training for 3000 epochs. (...) We train for 150K gradient steps for all models using a batch size of 16 across 4 A100 GPUs (...). Table 6: The hyperparameter settings for VLA experiments. These values are kept consistent across all methods. (...) Table 8: The hyperparameter settings for real-world experiments. These values are kept consistent across all methods.