Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Rotary Masked Autoencoders are Versatile Learners

Authors: Uros Zivanovic, Serafina Di Gioia, Andre Scaffidi, Martín de los Rios, Gabriella Contardo, Roberto Trotta

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We showcase Ro MAE s performance on a variety of modalities including irregular and multivariate time-series, images, and audio, demonstrating that Ro MAE surpasses specialized time-series architectures on difficult datasets such as the DESC ELAs Ti CC Challenge while maintaining MAE s usual performance across other modalities. In addition, we investigate Ro MAE s ability to reconstruct the embedded continuous positions, demonstrating that including learned embeddings in the input sequence breaks Ro PE s relative position property. Section 5 presents the results of our experiments.
Researcher Affiliation	Academia	Uros Zivanovic1, Serafina Di Gioia2, 3, Andre Scaffidi3, Martín de los Rios3, Gabriella Contardo7, 3, and Roberto Trotta3, 4, 5, 6 1University of Trieste, Italy 2Abdus Salam International Centre for Theoretical Physics (ICTP), Italy 3Scuola Internazionale Superiore di Studi Avanzati (SISSA), Italy 4INFN National Institute for Nuclear Physics, Italy 5ICSC Centro Nazionale di Ricerca in High Performance Computing, Italy 6Imperial College London, United Kingdom 7University of Nova Gorica, Slovenia
Pseudocode	No	The paper describes the Ro MAE pipeline in Figure 1 and its overall structure in Section 4.1. However, it does not include a formally labeled 'Pseudocode' or 'Algorithm' block, nor does it present structured steps in a code-like format outside of explanatory text and diagrams.
Open Source Code	Yes	Model and experimental code for Ro MAE is made public through a convenient Python package.2 2https://chromeilion.github.io/Ro MAE-Website/
Open Datasets	Yes	The experiment on the Tiny Image Net data set (Section 5.2) was run on one node of a Slurm cluster, utilizing two NVIDIA Tesla V100 GPUs for 5 hours. The experiment on the DESC ELAs TICC Challenge1 (Section 5.4), was run on a Slurm cluster using 4 nodes for 4 hours with each having 4 Nvidia A100 (with 64GB memory) GPUs. Together, the experiments on the UEA Time-Series Archive [3] (Section 5.4), Pendulum dataset [5] (Section 5.4), and absolute position experiments (Section 5.1) were run on a 1080ti GPU for a total of 1.5 hours. All interpolation experiments (Sec. 5.5) were run on a single NVIDIA A100-PCIE-40GB GPU (internal cluster), utilising 5 GB memory, 10min for the spirals dataset, 30 mins for the synthetic dataset, and 3 hours for Physio Net. Model and experimental code for Ro MAE is made public through a convenient Python package.2
Dataset Splits	Yes	Our generated training set has 20 000 samples while our generated test set has 4000. For the Audioset dataset we used the training/validation split provided by the downloaded dataset, while for Librispeech, downloaded and preprocessed using the scripts provided on the SSAST Github repo, we used a 70/30 split. The dataset is split into 80% training, 10% validation, and 10% test sets. We construct a dataset of 300 spirals as per the prescription from Ref. [12], similarly allocating 200 for training and 100 for testing.
Hardware Specification	Yes	The experiment on the Tiny Image Net data set (Section 5.2) was run on one node of a Slurm cluster, utilizing two NVIDIA Tesla V100 GPUs for 5 hours. The experiment on the DESC ELAs TICC Challenge1 (Section 5.4), was run on a Slurm cluster using 4 nodes for 4 hours with each having 4 Nvidia A100 (with 64GB memory) GPUs. Together, the experiments on the UEA Time-Series Archive [3] (Section 5.4), Pendulum dataset [5] (Section 5.4), and absolute position experiments (Section 5.1) were run on a 1080ti GPU for a total of 1.5 hours. All interpolation experiments (Sec. 5.5) were run on a single NVIDIA A100-PCIE-40GB GPU (internal cluster), utilising 5 GB memory, 10min for the spirals dataset, 30 mins for the synthetic dataset, and 3 hours for Physio Net.
Software Dependencies	No	Although all final results are in full FP32 precision, we also tried mixed precision training through Py Torch Automatic Mixed Precision (AMP).5 When training with AMP, some operations are conducted in a lower precision (either 16-bit brain floating-point (BF16) or 16-bit floating-point (FP16)) instead of the usual 32-bit floating point. This speeds the model up greatly, resulting in significantly less compute resources being used. In our experiments we found that Ro MAE still converged well when using mixed precision. While PyTorch AMP is mentioned, no specific version numbers for PyTorch or other libraries are provided.
Experiment Setup	Yes	Throughout the experiments we make use of different sizes of Ro MAE: Ro MAE-tiny, Ro MAE-small, and Ro MAE-base, as detailed in Appendix A.1. Compute Details: The experiment on the Tiny Image Net data set (Section 5.2) was run on one node of a Slurm cluster, utilizing two NVIDIA Tesla V100 GPUs for 5 hours. The experiment on the DESC ELAs TICC Challenge1 (Section 5.4), was run on a Slurm cluster using 4 nodes for 4 hours with each having 4 Nvidia A100 (with 64GB memory) GPUs. Together, the experiments on the UEA Time-Series Archive [3] (Section 5.4), Pendulum dataset [5] (Section 5.4), and absolute position experiments (Section 5.1) were run on a 1080ti GPU for a total of 1.5 hours. All interpolation experiments (Sec. 5.5) were run on a single NVIDIA A100-PCIE-40GB GPU (internal cluster), utilising 5 GB memory, 10min for the spirals dataset, 30 mins for the synthetic dataset, and 3 hours for Physio Net. Model and experimental code for Ro MAE is made public through a convenient Python package.2 Appendix D provides full experimental details and hyperparameters for each experiment, including optimizer, learning rates, batch sizes, epochs, gradient clipping, LR schedules, dropout, stochastic depth, label smoothing, and precision (Tables 10-21).