Retrieval & Fine-Tuning for In-Context Tabular Models
Authors: Valentin Thomas, Junwei Ma, Rasa Hosseinzadeh, Keyvan Golestan, Guangwei Yu, Maks Volkovs, Anthony L. Caterini
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive evaluation on 95 datasets curated by Tab Zilla from Open ML, upon which we establish a new state-of-the-art with Lo Cal PFN even with respect to tuned tree-based models. |
| Researcher Affiliation | Industry | Valentin Thomas valentin.t@layer6.ai Junwei Ma jeremy@layer6.ai Rasa Hosseinzadeh rasa@layer6.ai Keyvan Golestan keyvan@layer6.ai Guangwei Yu guang@layer6.ai Maksims Volkovs maks@layer6.ai Anthony Caterini anthony@layer6.ai |
| Pseudocode | No | The paper describes the methods using text and diagrams (e.g., Figure 3: 'Details of the architecture and the efficient context used during fine-tuning'), but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | We release all code to reproduce our results at https://github.com/layer6ai-labs/Lo Cal PFN. |
| Open Datasets | Yes | We evaluate our methods against competitive baselines using 95 out of the 176 datasets from Tab Zilla [35], originally sourced from Open ML [5]. |
| Dataset Splits | Yes | For each dataset, we use the splits from Tab Zilla with train-validation-test ratio of 80:10:10. |
| Hardware Specification | Yes | All experiments for our proposed methods can be run on a machine with a single NVIDIA RTX 6000 GPU Ada Generation, 995Gi RAM, and AMD Ryzen Threadripper PRO 5995WX 64-Cores CPU. |
| Software Dependencies | No | The paper mentions using the 'faiss [28, 17] library' and the 'Tab PFN repository,' but it does not specify version numbers for these or other software dependencies. |
| Experiment Setup | Yes | For Lo Cal PFN experiments, we adopt the Adam W [32] optimizer with a learning rate of 0.01 and weight decay of 0.01. We do not have warmup or a learning rate scheduler. For the approximate local context for training, we use the same number of neighbours as Tab PFN-k NN. We use a fixed number of query points (1,000) sampled from the training set and a batch of 2. For our reported results, we also use one-hot encoding for neighbour retrieval and inference. In addition, we evaluate our model every 30 gradient steps and apply early stopping based on the validation set AUC for each fold respectively. |