Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
A Circular Argument: Does RoPE need to be Equivariant for Vision?
Authors: Chase van de Geijn, Timo Lüddecke, Polina Turishcheva, Alexander Ecker
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 5, we find that Spherical Ro PE has the same training behaviors as its equivariant analogues and we find that Uniform Ro PE outperforms the standard learned encodings, while performing worse than other Ro PE methods. We conclude that our evidence suggests that the performance of Ro PE over traditional embeddings is not explained by equivariance. ... We test the different PEs on CIFAR100 [39] and Image Net [58] using a standard Vision Transformer the Vi T-S implementation from the timm [79] library. ... Table 2: Performance comparison (top-1 accuracy) across datasets and methods. ... Figure 3: Dependence of accuracy on image resolution for Vi T-S with various positional embedding methods on Image Net-1k. |
| Researcher Affiliation | Academia | Chase van de Geijn1, , Timo Lüddecke1, Polina Turishcheva1 Alexander S. Ecker1,2, 1Institute of Computer Science and Campus Institute Data Science, University of Göttingen 2Max Planck Institute for Dynamics and Self-Organization, Göttingen, Germany EMAIL EMAIL |
| Pseudocode | No | The paper describes methods mathematically and textually, such as in Section 2 Background and Section 3 The Generality of Learned Ro PE and Mixed Ro PE, but does not include any clearly labeled pseudocode or algorithm blocks. For example, Appendix F 'Fast Implementation' describes mathematical operations rather than structured pseudocode. |
| Open Source Code | No | In the Neur IPS Paper Checklist, under section 5, the authors state: 'Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: We intend to make the code public.' |
| Open Datasets | Yes | Datasets and architecture. We test the different PEs on CIFAR100 [39] and Image Net [58] using a standard Vision Transformer the Vi T-S implementation from the timm [79] library. |
| Dataset Splits | Yes | Datasets and architecture. We test the different PEs on CIFAR100 [39] and Image Net [58] using a standard Vision Transformer the Vi T-S implementation from the timm [79] library. ... We evaluate without any hyperparameter tuning directly on the validation sets. ... In this section, we include extra evaluations including, basic data scaling... We partition the CIFAR100 dataset into smaller subsets. ... Table 7: Performance on different portions of CIFAR100. Dataset Size 0.2, 0.4, 0.6, 0.8 |
| Hardware Specification | Yes | CIFAR100 All experiments on CIFAR100 were performed on one A100 GPUs with a batch size 256. ... Image Net All experiments on Image Net-1k were performed on four A100 GPUs with a batch size 256. |
| Software Dependencies | No | The paper mentions using 'timm [79] library' for the Vi T-S implementation and 'torch.autocast' for backend precision (Appendix G, H). However, it does not provide specific version numbers for any of these software dependencies. |
| Experiment Setup | Yes | For CIFAR100, the embedding dimensions are changed from 64 Nheads to 60 Nheads to be compatible with pairs, triplets and quadruples. For Image Net, we make the embedding dimension 63 Nheads for Spherical Ro PE and 64 Nheads for other methods. ... CIFAR100 All experiments on CIFAR100 were performed on one A100 GPUs with a batch size 256. We use a patch size of 4 4 on the original image size 32 32. The training uses heavy regularization and augmentations including dropout, Mix Up [87] and Cut Mix [86]. The models are trained for 400 epochs, taking 40 seconds per training loop. ... Image Net All experiments on Image Net-1k were performed on four A100 GPUs with a batch size 256. We used cosine learning rate with a learning rate of 3e 3 for 200 epochs with 5 epochs of linear warm-up. ... Table 5: Hyperparameters for Image Net-1K Training ... Table 6: Hyperparameters for CIFAR100 Training |