Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Alias-Free ViT: Fractional Shift Invariance via Linear Attention
Authors: Hagay Michaeli, Daniel Soudry
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our Alias-Free Transformer (AFT) on the Image Net dataset [12] and compare its accuracy and shift consistency with the baseline XCi T model. We additionally compare our method with the adaptive polyphase sampling (APS) approach [13, 47]... In Sections 4.1 and 4.2 we evaluate the baseline, APS, and AFT models using cyclic translations and implement m/n-fractional translation by translating in m pixels the n-upsampled image using sinc-interpolation... In Section 4.3 we use more realistic types of translations, and add additional publicly available Vi Ts to the comparison. 4.4 Ablation study |
| Researcher Affiliation | Academia | Hagay Michaeli Technion Haifa, Israel EMAIL Daniel Soudry Technion Haifa, Israel EMAIL |
| Pseudocode | No | The paper describes the methodology using prose and mathematical equations but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1Our code is available at github.com/hmichaeli/alias_free_vit. |
| Open Datasets | Yes | We evaluate our Alias-Free Transformer (AFT) on the Image Net dataset [12]... We evaluate all models from section 4.1 on three additional classification benchmarks CIFAR-10, CIFAR-100 [37], and Stanford Cars [36] |
| Dataset Splits | Yes | The alias-free models have similar accuracy to the baseline models, and much higher consistency in both integer and half-pixel translations. The APS models, on the other hand, achieve near-100% consistency under integer translations, as expected. However, they have a more modest improvement in consistency to half-pixel shifts... In Section 4.3 we use more realistic types of translations, and add additional publicly available Vi Ts to the comparison. |
| Hardware Specification | Yes | We train all models for 400 epochs, following the XCi T training recipe [16], using Py Torch [41], on a single machine with 8 NVIDIA RTX A6000. We observe a slight improvement for the AF models when training with a smaller batch size; therefore, we reduce the batch size from 1024 to 512 for the AF versions (See additional details in Appendix D.1). |
| Software Dependencies | No | We train all models for 400 epochs, following the XCi T training recipe [16], using Py Torch [41], on a single machine with 8 NVIDIA RTX A6000. |
| Experiment Setup | Yes | We train all models for 400 epochs, following the XCi T training recipe [16]... We reduce the batch size from 1024 to 512 for the AF versions (See additional details in Appendix D.1). ... Table 8: Hyperparameters. Unless stated otherwise, the same settings apply to all models. Optimizer Adam W (β1, β2) (0.9, 0.999) Weight decay 0.05 ... Base LR 1 10 3 (Baseline), 5 10 4 (AF, APS) Warm-up epochs 5 LR decay Cosine Min LR 1 10 5 Data Resolution 224 224 Batch size 1024 (Baseline), 512 (AF, APS) Regularization Layer scale (ϵ init) 1.0 Stochastic depth 0.0 (Nano), 0.05 (Small) |