Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

EquiTabPFN: A Target-Permutation Equivariant Prior Fitted Network

Authors: Michael Arbel, David Salinas, Frank Hutter

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical evaluation on standard classification benchmarks shows that, on datasets with more classes than those seen during pre-training, our model matches or surpasses existing methods while incurring lower computational overhead.
Researcher Affiliation	Collaboration	Michael Arbel 1 David Salinas 2,3 Frank Hutter2,3,4 1INRIA 2University of Freiburg 3ELLIS Institute Tübingen 4Prior Labs Equal contribution
Pseudocode	No	The paper describes the model architecture and training procedures in detail, including mathematical formulations for attention mechanisms, but does not present them in a structured pseudocode or algorithm block format.
Open Source Code	Yes	The code for training and evaluating our model is available at https://github.com/ Michael Arbel/Equi Tab PFN/.
Open Datasets	Yes	For evaluation, we consider classification tasks from the Tab Zilla benchmark [21]. Details of the datasets/tasks used are provided in Tables 1 and 2.
Dataset Splits	Yes	In this protocol, each dataset has a fixed size of 1024 and is split into training and test uniformly at random.
Hardware Specification	Yes	The total training time took approximately 4 days on a single A100 GPU with 80GB of memory. ... The runtime is displayed with color on a log scale and is reported on a V100 GPU for PFNs. ... On an A100 GPU with 2,000 samples (100 features, 10 classes), Equi Tab PFN required 566 GFLOPS vs. 76 GFLOPS for Tab PFN ( 7.45 more).
Software Dependencies	No	We used the Adam optimizer [15] with initial learning rate of 0.0001 and linear-warmup scheduler for the first 10 epochs followed by cosine annealing [18] as in Hollmann et al. [12]. No specific software versions for libraries like PyTorch or CUDA are provided.
Experiment Setup	Yes	Training is performed using 153600 batches of 72 synthetically generated datasets each, which means the model was exposed to 11M artificial datasets during pre-training, a similar order of magnitude of datasets used for pre-training Tab PFN by Hollmann et al. [12]. ... We used the Adam optimizer [15] with initial learning rate of 0.0001 and linear-warmup scheduler for the first 10 epochs followed by cosine annealing [18] as in Hollmann et al. [12]. ... We use an Equi Tab PFN network with 12 self-attention layers alternating between both type of attention introduced in Section 4: 6 blocks Self Attc and 6 blocks Self Attb. Each self-attention layer consists of a multi-head attention blocks with 4 heads, embeddings of dimension 512, and hidden layers of dimension 1024.