Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Attention on the Sphere

Authors: Boris Bonev, Max Rietmann, Andrea Paris, Alberto Carpentieri, Thorsten Kurth

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method by adapting the popular Vision Transformer (Vi T) and Seg Former architecture to equirectangular grids on the spherical domain. [...] A comprehensive study comparing spherical Transformers to their corresponding Euclidean baseline is carried out for both global and local (neighborhood) attention variants. [...] The method is validated on three diverse tasks: simulating shallow water equations on the rotating sphere, spherical image segmentation, and spherical depth estimation. Across all tasks, our spherical Transformers consistently outperform their planar counterparts, highlighting the advantage of geometric priors for learning on spherical domains. Also, the paper includes a dedicated "4 Experiments" section with subsections like "4.1 Segmentation on the sphere", "4.2 Depth estimation on the sphere", "4.3 Shallow water equations on the rotating sphere", and "4.4 Ablation study", featuring performance tables and metrics.
Researcher Affiliation Industry Boris Bonev *, Max Rietmann *, Andrea Paris, Alberto Carpentieri, Thorsten Kurth * NVIDIA Corporation, 95051 Santa Clara, CA, USA EMAIL
Pseudocode No The paper describes methods using mathematical formulations (e.g., equations (1)-(6)) and implementation details (Section 3.3, Appendix B), but it does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks or figures.
Open Source Code Yes Our implementation and code to reproduce experiments are available in the open-source library torch-harmonics. ... We implement both spherical attention mechanisms for equiangular and Gaussian grids and make them publicly available in torch-harmonics, a library for machine learning and signal processing on the sphere.
Open Datasets Yes The Stanford 2D3DS Dataset [1] provides spherical images taken indoors in a university setting. ... The semantic labels is provided as a json file in the Git Hub repository, while the image data itself is hosted at https://cvg-data.inf.ethz. ch/2d3ds/no_xyz/.
Dataset Splits Yes For the purpose of training, we downsample the data to a resolution of 128x256 and split the dataset into 95% train, 2.5% test and 2.5% validation parts.
Hardware Specification Yes On a single NVIDIA RTX 6000 Ada, training took between 12 and 215 minutes depending on the respective models. ... Training times vary between 11 and 220 minutes on a single NVIDIA RTX6000 Ada GPU depending on the architecture. ... On an NVIDIA A6000 GPU, training took approximately between 24 and 34 minutes depending on the model.
Software Dependencies No The paper mentions "Py Torch" and "torch-harmonics" (in "Our implementation and code to reproduce experiments are available in the open-source library torch-harmonics." and "Our global spherical attention mechanism leverages Py Torch s native scaled dot product attention (SDPA)") but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes Training is carried using uniform weighted cross entropy loss for 200 epochs using the ADAMW optimizer [31], a learning rate of 0.5 10 4 and a cosine scheduler. Furthermore, to reduce overfitting, weight decay and dropout path rate are both set to 0.1. ... Models are trained for 100 epochs using the Ldepth objective functions (16) which combines L1 and Sobolev W 1,1 losses... All architectures are trained using solutions output by a traditional spectral solver, with 12 classical time-steps steps to provide the single prediction step at a learning rate of 10 3. All of the above is carried out using the ADAM optimizer [27] and using a Reduce LROn PLeateau learning rate scheduler.