Large Scale Transfer Learning for Tabular Data via Language Modeling

Authors: Josh Gardner, Juan Perdomo, Ludwig Schmidt

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through evaluation across a test suite of 329 datasets, we find that TABULA-8B has zero-shot accuracy on unseen tables that is over 15 percentage points (pp) higher than random guessing, a feat that is not possible with existing state-of-the-art tabular prediction models (e.g. XGBoost, Tab PFN). In the few-shot setting (1-32 shots), without any fine-tuning on the target datasets, TABULA-8B is 5-15 pp more accurate than XGBoost and Tab PFN models that are explicitly trained on equal, or even up to 16 more data.
Researcher Affiliation Academia University of Washington, #Harvard University, Stanford University
Pseudocode No The paper describes methods and procedures in textual paragraphs and figures, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes We release our model, code, and data along with the publication of this paper.
Open Datasets Yes T4 Dataset: The T4 dataset is available via public credentialized access on Hugging Face datasets at https://huggingface.co/datasets/mlfoundations/t4-full. Because the dataset is derived from Tab Lib, users must first obtain permission to access Tab Lib at https://huggingface. co/datasets/approximatelabs/tablib-v1-full.
Dataset Splits Yes Whenever possible, we perform hyperparameter tuning on XGBoost and Tab PFN in order to maximize their performance. See Appendix D.2 for further details on baseline implementation and tuning. For each of the 10 independent trials, we tune the hyperparameters of the model. For XGBoost, we conduct 10 iterations of hyperparameter tuning using the Hyper Opt hyperparameter optimization library and the hyperparameter grid defined in [16]. For Tab PFN and L2-regularized Logistic Regression, we conduct a full grid search (since there is only a single hyperparameter).
Hardware Specification Yes Our final training run for TABULA-8B took approximately 6 days on a single node of 8 NVIDIA 80GB A100 GPUs on a commercial cloud provider. For our Tab Lib filtering and XGboost experiments, we used an academic CPU cluster. Our evaluations were distributed across two academic GPU clusters consisting of NVIDIA 40GB A40 GPUs and NVIDIA A100 GPUs.
Software Dependencies No The paper mentions several software tools and libraries like fasttext, Hyper Opt, Tab PFN, and Llama 3, but it does not specify explicit version numbers for these software dependencies (e.g., 'fasttext library [27]' or 'official Tab PFN implementation').
Experiment Setup Yes The final model is trained for 40k steps with a global batch size of 24 (with sample packing, this is roughly equivalent to a global batch size of 600 rows of tabular data). The model sees roughly 8B tokens during training; we note that is less than 10% of the 100Btokens in T4, and less than one one thousandth of Tab Lib itself. We fully fine-tune all model parameters, as opposed to parameter-efficient fine-tuning, since full-fine tuning consistently benefits from scale [24, 64]. Reproducibility details are given in Appendix B.