Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Quantization-Free Autoregressive Action Transformer
Authors: Ziyad Sheebaelhamd, Michael Tschannen, Michael Muehlebach, Claire Vernade
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the performance of Q-FAT through extensive experiments across 9 different tasks using five representative simulated robotics environments: Push T [7], Kitchen [11], UR3 Block Push, Block Push [10] and Multimodal Ant [26]. These environments cover the most common control signals in practice, namely position and velocity of robot s end effector and robot joint angles. A detailed description of each environment is provided in Appendix C. We further ran preliminary experiments on an autonomous driving dataset (nu Scene [4]) and we add the experimental results in Appendix E. |
| Researcher Affiliation | Collaboration | Ziyad Sheebaelhamd University of Tübingen Michael Tschannen Google Deep Mind Michael Muehlebach Max Planck Institute for Intelligent Systems Claire Vernade University of Tübingen |
| Pseudocode | Yes | Algorithm 1 Mode Extraction and Sampling |
| Open Source Code | Yes | The implementation is available at https://github.com/ziyadsheeba/qfat. |
| Open Datasets | Yes | We evaluate the performance of Q-FAT through extensive experiments across 9 different tasks using five representative simulated robotics environments: Push T [7], Kitchen [11], UR3 Block Push, Block Push [10] and Multimodal Ant [26]. ... We further ran preliminary experiments on an autonomous driving dataset (nu Scene [4]) |
| Dataset Splits | Yes | Following Lee et al. [26], Shafiullah et al. [45], we split the data into 95% for training and 5% for validation. |
| Hardware Specification | Yes | To compare computational efficiency, we evaluated the inference times of Q-FAT and VQ-Be T on a 16 GB Mac Book Pro CPU. ... However, all experiments were run on a single desktop-grade GPU with at most 32 GB of memory. |
| Software Dependencies | No | For our experiments, we used the min GPT [3] backbone as the decoder-only transformer implementation. |
| Experiment Setup | Yes | The hyperparameters used for each of the environments are detailed in Table 3. ... Table 3: Environment hyperparameters. Layers, Attention heads, Embedding dimension, Dropout probability, State history size, Action horizon, Training epochs, Batch size, Number of mixtures k, Maximum learning rate, Minimum learning rate, Learning Rate Schedule, Optimizer |