Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

BEAST: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning

Authors: Hongyi Zhou, Weiran Liao, Xi Huang, Yucheng Tang, Fabian Otto, Xiaogang Jia, Xinkai Jiang, Simon Hilber, Ge Li, Qian Wang, Ömer Yağmurlu, Nils Blank, Moritz Reuss, Rudolf Lioutikov

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We extensively evaluate BEAST by integrating it with three distinct model architectures: a Variational Autoencoder (VAE) with continuous tokens, a decoder-only Transformer with discrete tokens, and Florence-2, a Vision Language Model with an encoder-decoder architecture, demonstrating BEAST s compatibility and scalability with large pretrained models. We evaluate BEAST across three established benchmarks consisting of 166 simulated tasks and on three distinct robot settings with a total of 8 real-world tasks. Experimental results demonstrate that BEAST (i) significantly reduces both training and inference computational costs, and (ii) consistently generates smooth, high-frequency control signals suitable for continuous control tasks while (iii) reliably achieves competitive task success rates compared to state-of-the-art methods.
Researcher Affiliation	Collaboration	Hongyi Zhou Weiran Liao Xi Huang Yucheng Tang Fabian Otto Xiaogang Jia Xinkai Jiang Simon Hilber Ge Li Qian Wang Ömer Erdinç Ya gmurlu Nils Blank , Moritz Reuss Rudolf Lioutikov , Karlsruhe Institute of Technology Robotics Institute Germany Microsoft Research
Pseudocode	No	The paper describes the methodology using mathematical formulations (e.g., Equation 4) and provides high-level pipeline overviews in figures (e.g., Figure 2, Figure 10), but does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Videos and code are available at https://intuitive-robots.github.io/beast_website/.
Open Datasets	Yes	We conduct extensive evaluations in both simulated and real-world settings... Simulation Benchmarks. CALVIN [47] features 34 tabletop manipulation tasks... LIBERO [48] tests a delta-EEF controlled Panda Robot... ALOHA [6] tests an absolute joint position controlled ALOHA Robot...
Dataset Splits	Yes	CALVIN [47] features 34 tabletop manipulation tasks... We evaluate two settings: CALVIN ABC (zero-shot generalization) and CALVIN ABCD (scaling with more data)... LIBERO [48]... We report results on four specialized benchmark settings with 10 tasks each (Long, Spatial, Object, and Goal). Success is measured as the percentage of successful task completions across 50 trials per task. ALOHA [6]... The success rate is reported over 500 episodes of evaluation.
Hardware Specification	Yes	We measure the inference efficiency on an RTX 4090 GPU... Each node contains 4 NVIDIA A100, for BEAST-F we use 4 GPUs for training. For BEAST-D and BEAST-ACT, we use one GPU for training.
Software Dependencies	No	The paper mentions software components like Florence-2, CLIP, and ResNet-18, but it does not specify any version numbers for these or other key software dependencies.
Experiment Setup	Yes	Hyperparameter LIBERO CALVIN SPATIAL OBJECT GOAL LONG ABCD D ABC D Action Sequence Length 20 20 20 20 20 20 Number of Basis 10 10 10 10 10 10 Vocabulary Size 256 256 256 256 256 256 Optimizer Adam W Adam W Adam W Adam W Adam W Adam W Betas [0.9, 0.95] [0.9, 0.95] [0.9, 0.95] [0.9, 0.95] [0.9, 0.95] [0.9, 0.95] Learning Rate 2e-5 2e-5 2e-5 2e-5 2e-5 2e-5 Batch Size 128 128 128 128 32 32 Train Steps (k) 35 35 50 70 30 30