Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

KungfuBot: Physics-Based Humanoid Whole-Body Control for Learning Highly-Dynamic Skills

Authors: Weiji Xie, Jinrui Han, Jiakun Zheng, Huanyu Li, Xinzhe Liu, Jiyuan Shi, Weinan Zhang, Chenjia Bai, Xuelong Li

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we present experiments to evaluate the effectiveness of PBHC. Our experiments aim to answer the following key research questions: Q1. Can our physics-based motion filtering effectively filter out untrackable motions? Q2. Does PBHC achieve superior tracking performance compared to prior methods in simulation? Q3. Does the adaptive motion tracking mechanism improve tracking precision? Q4. How well does PBHC perform in real-world deployment?
Researcher Affiliation Collaboration 1Institute of Artificial Intelligence (Tele AI), China Telecom 2Shanghai Jiao Tong University 3East China University of Science and Technology 4Harbin Institute of Technology 5Shanghai Tech University
Pseudocode No The paper describes methods and processes in detail, such as the motion processing pipeline in Section 3.1 and adaptive motion tracking in Section 3.2, using textual descriptions and mathematical formulations. However, it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes The project page is https://kungfu-bot.github.io. Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: We provide sufficient material in supplemental material.
Open Datasets Yes To enhance motion diversity, we incorporate additional data from open-source datasets, AMASS [4] and LAFAN [20].
Dataset Splits No We assess the policy s tracking performance using a highly-dynamic motion dataset constructed through our proposed motion processing pipeline, detailed in Appendix B. Examples are shown in Fig. 5. We categorize motions into three difficulty levels: easy, medium, and hard, based on their agility requirements. For each setting, policies are trained in Isaac Gym [29] with three random seeds and are evaluated over 1,000 rollout episodes. The paper describes a set of 13 distinct motions used for training and evaluation but does not specify explicit training/test/validation splits for these motions. Instead, individual policies are trained for each motion and then evaluated via rollouts.
Hardware Specification Yes Each experiment is conducted on a machine with a 24-core Intel i7-13700 CPU running at 5.2GHz, 32 GB of RAM, and a single NVIDIA Ge Force RTX 4090 GPU, with Ubuntu 20.04.
Software Dependencies No Each experiment is conducted on a machine with [...] Ubuntu 20.04. We adopt an off-the-shelf RL algorithm, PPO [13], for policy optimization with an actor-critic architecture. Optimizer Adam. policies are trained in Isaac Gym [29]. The paper mentions the operating system (Ubuntu 20.04) and frameworks like PPO and Isaac Gym, but does not provide specific version numbers for these or other key software libraries and dependencies.
Experiment Setup Yes All reward functions are detailed in Table 6. Our reward design consists of two main parts: task rewards and regularization rewards. The specific settings are given in Table 7. The detailed PPO hyperparameters are shown in Table 8. To imitate high-dynamic motions, we introduce two curriculum mechanisms: a termination curriculum that gradually reduces tracking error tolerance, and a penalty curriculum that progressively increases the weight of regularization terms, promoting more stable and physically plausible behaviors. The gains of the PD controller are listed in Table 9.