Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Quantization-Free Autoregressive Action Transformer

Authors: Ziyad Sheebaelhamd, Michael Tschannen, Michael Muehlebach, Claire Vernade

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the performance of Q-FAT through extensive experiments across 9 different tasks using five representative simulated robotics environments: Push T [7], Kitchen [11], UR3 Block Push, Block Push [10] and Multimodal Ant [26]. These environments cover the most common control signals in practice, namely position and velocity of robot s end effector and robot joint angles. A detailed description of each environment is provided in Appendix C. We further ran preliminary experiments on an autonomous driving dataset (nu Scene [4]) and we add the experimental results in Appendix E.
Researcher Affiliation	Collaboration	Ziyad Sheebaelhamd University of Tübingen Michael Tschannen Google Deep Mind Michael Muehlebach Max Planck Institute for Intelligent Systems Claire Vernade University of Tübingen
Pseudocode	Yes	Algorithm 1 Mode Extraction and Sampling
Open Source Code	Yes	The implementation is available at https://github.com/ziyadsheeba/qfat.
Open Datasets	Yes	We evaluate the performance of Q-FAT through extensive experiments across 9 different tasks using five representative simulated robotics environments: Push T [7], Kitchen [11], UR3 Block Push, Block Push [10] and Multimodal Ant [26]. ... We further ran preliminary experiments on an autonomous driving dataset (nu Scene [4])
Dataset Splits	Yes	Following Lee et al. [26], Shafiullah et al. [45], we split the data into 95% for training and 5% for validation.
Hardware Specification	Yes	To compare computational efficiency, we evaluated the inference times of Q-FAT and VQ-Be T on a 16 GB Mac Book Pro CPU. ... However, all experiments were run on a single desktop-grade GPU with at most 32 GB of memory.
Software Dependencies	No	For our experiments, we used the min GPT [3] backbone as the decoder-only transformer implementation.
Experiment Setup	Yes	The hyperparameters used for each of the environments are detailed in Table 3. ... Table 3: Environment hyperparameters. Layers, Attention heads, Embedding dimension, Dropout probability, State history size, Action horizon, Training epochs, Batch size, Number of mixtures k, Maximum learning rate, Minimum learning rate, Learning Rate Schedule, Optimizer