Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Quantization-Free Autoregressive Action Transformer

Authors: Ziyad Sheebaelhamd, Michael Tschannen, Michael Muehlebach, Claire Vernade

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the performance of Q-FAT through extensive experiments across 9 different tasks using five representative simulated robotics environments: Push T [7], Kitchen [11], UR3 Block Push, Block Push [10] and Multimodal Ant [26]. These environments cover the most common control signals in practice, namely position and velocity of robot s end effector and robot joint angles. A detailed description of each environment is provided in Appendix C. We further ran preliminary experiments on an autonomous driving dataset (nu Scene [4]) and we add the experimental results in Appendix E.
Researcher Affiliation Collaboration Ziyad Sheebaelhamd University of Tübingen Michael Tschannen Google Deep Mind Michael Muehlebach Max Planck Institute for Intelligent Systems Claire Vernade University of Tübingen
Pseudocode Yes Algorithm 1 Mode Extraction and Sampling
Open Source Code Yes The implementation is available at https://github.com/ziyadsheeba/qfat.
Open Datasets Yes We evaluate the performance of Q-FAT through extensive experiments across 9 different tasks using five representative simulated robotics environments: Push T [7], Kitchen [11], UR3 Block Push, Block Push [10] and Multimodal Ant [26]. ... We further ran preliminary experiments on an autonomous driving dataset (nu Scene [4])
Dataset Splits Yes Following Lee et al. [26], Shafiullah et al. [45], we split the data into 95% for training and 5% for validation.
Hardware Specification Yes To compare computational efficiency, we evaluated the inference times of Q-FAT and VQ-Be T on a 16 GB Mac Book Pro CPU. ... However, all experiments were run on a single desktop-grade GPU with at most 32 GB of memory.
Software Dependencies No For our experiments, we used the min GPT [3] backbone as the decoder-only transformer implementation.
Experiment Setup Yes The hyperparameters used for each of the environments are detailed in Table 3. ... Table 3: Environment hyperparameters. Layers, Attention heads, Embedding dimension, Dropout probability, State history size, Action horizon, Training epochs, Batch size, Number of mixtures k, Maximum learning rate, Minimum learning rate, Learning Rate Schedule, Optimizer