Retrieval & Fine-Tuning for In-Context Tabular Models

Authors: Valentin Thomas, Junwei Ma, Rasa Hosseinzadeh, Keyvan Golestan, Guangwei Yu, Maks Volkovs, Anthony L. Caterini

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive evaluation on 95 datasets curated by Tab Zilla from Open ML, upon which we establish a new state-of-the-art with Lo Cal PFN even with respect to tuned tree-based models.
Researcher Affiliation Industry Valentin Thomas valentin.t@layer6.ai Junwei Ma jeremy@layer6.ai Rasa Hosseinzadeh rasa@layer6.ai Keyvan Golestan keyvan@layer6.ai Guangwei Yu guang@layer6.ai Maksims Volkovs maks@layer6.ai Anthony Caterini anthony@layer6.ai
Pseudocode No The paper describes the methods using text and diagrams (e.g., Figure 3: 'Details of the architecture and the efficient context used during fine-tuning'), but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes We release all code to reproduce our results at https://github.com/layer6ai-labs/Lo Cal PFN.
Open Datasets Yes We evaluate our methods against competitive baselines using 95 out of the 176 datasets from Tab Zilla [35], originally sourced from Open ML [5].
Dataset Splits Yes For each dataset, we use the splits from Tab Zilla with train-validation-test ratio of 80:10:10.
Hardware Specification Yes All experiments for our proposed methods can be run on a machine with a single NVIDIA RTX 6000 GPU Ada Generation, 995Gi RAM, and AMD Ryzen Threadripper PRO 5995WX 64-Cores CPU.
Software Dependencies No The paper mentions using the 'faiss [28, 17] library' and the 'Tab PFN repository,' but it does not specify version numbers for these or other software dependencies.
Experiment Setup Yes For Lo Cal PFN experiments, we adopt the Adam W [32] optimizer with a learning rate of 0.01 and weight decay of 0.01. We do not have warmup or a learning rate scheduler. For the approximate local context for training, we use the same number of neighbours as Tab PFN-k NN. We use a fixed number of query points (1,000) sampled from the training set and a batch of 2. For our reported results, we also use one-hot encoding for neighbour retrieval and inference. In addition, we evaluate our model every 30 gradient steps and apply early stopping based on the validation set AUC for each fold respectively.