Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

TabDPT: Scaling Tabular Foundation Models on Real Data

Authors: Junwei Ma, Valentin Thomas, Rasa Hosseinzadeh, Alex Labach, Jesse C. Cresswell, Keyvan Golestan, Guangwei Yu, Anthony L Caterini, Maks Volkovs

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Comprehensive evaluations on the Open ML-CC18 [5] and Open ML-CTR23 [17] benchmarks confirm the effectiveness of Tab DPT. It consistently matches or surpasses the performance of specialized models that undergo extensive per-dataset hyperparameter optimization at a fraction of the deployment time and cost. Furthermore, we show strong results in the few-shot regime, where, with minimal semi-supervised modifications, Tab DPT outperforms specialized baselines on 10-shot classification tasks, highlighting its versatility. Finally, we demonstrate that Tab DPT scales predictably with both model size and quantity of real pre-training data (Figure 1), underscoring the viability of large-scale foundation models for tabular domains.
Researcher Affiliation	Industry	Junwei Ma , Valentin Thomas ,1, Rasa Hosseinzadeh, Alex Labach, Hamidreza Kamkari, Jesse C. Cresswell, Keyvan Golestan, Guangwei Yu, Anthony L. Caterini1, Maksims Volkovs Layer 6 AI, Toronto 1Correspondence to EMAIL
Pseudocode	Yes	Algorithm 1 One Training Step of Tab DPT Code Block 1: Pytorch Dataloader Code Block 2: Training Loop
Open Source Code	Yes	We open-source our full pipeline: inference code including trained model weights can be found here, and the training code to reproduce experiments can be found here.
Open Datasets	Yes	Our training data was collected from Open ML [61] and consists of a wide range of public tabular datasets across diverse domains, all available under the CC-BY licence. For evaluation, we consider two commonly used public benchmarks containing a total of 107 datasets: CC18 [5] for classification tasks and CTR23 [17] for regression tasks. All data is publicly available on Open ML.
Dataset Splits	No	We run all methods on at least two different splits of the data and report 95% confidence intervals using bootstrapping [1]. For XGBoost, Cat Boost, and Light GBM, we use results reported in the Tab Zilla benchmark [46]. Some datasets are missing results, so we conduct hyperparameter optimization and train models following the Tab Zilla protocol using the code repository from [22].1
Hardware Specification	Yes	All training and inference is done on Nvidia A100 GPUs with 40 GB of memory.
Software Dependencies	No	All columns containing non-numerical values are mapped to integers using scikit-learn s [49] Label Encoder function. We use the faiss library3 for fast retrieval.
Experiment Setup	Yes	By default we set a learning rate of 5 10 4 and weight decay of 5 10 2 with label smoothing of 0.1. The batch size is set to 256 and both context and query lengths are set to 1024. Model parameters are kept in brain float 16-bit (bfloat16) format. Table C.1: Architectural Parameters Table C.2: Number of Layers and Transformer Dimensions