reproducibilityindex.ai

Retrieval & Fine-Tuning for In-Context Tabular Models

Authors: Valentin Thomas, Junwei Ma, Rasa Hosseinzadeh, Keyvan Golestan, Guangwei Yu, Maks Volkovs, Anthony L. Caterini

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive evaluation on 95 datasets curated by Tab Zilla from Open ML, upon which we establish a new state-of-the-art with Lo Cal PFN even with respect to tuned tree-based models.
Researcher Affiliation	Industry	Valentin Thomas valentin.t@layer6.ai Junwei Ma jeremy@layer6.ai Rasa Hosseinzadeh rasa@layer6.ai Keyvan Golestan keyvan@layer6.ai Guangwei Yu guang@layer6.ai Maksims Volkovs maks@layer6.ai Anthony Caterini anthony@layer6.ai
Pseudocode	No	The paper describes the methods using text and diagrams (e.g., Figure 3: 'Details of the architecture and the efficient context used during fine-tuning'), but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	We release all code to reproduce our results at https://github.com/layer6ai-labs/Lo Cal PFN.
Open Datasets	Yes	We evaluate our methods against competitive baselines using 95 out of the 176 datasets from Tab Zilla [35], originally sourced from Open ML [5].
Dataset Splits	Yes	For each dataset, we use the splits from Tab Zilla with train-validation-test ratio of 80:10:10.
Hardware Specification	Yes	All experiments for our proposed methods can be run on a machine with a single NVIDIA RTX 6000 GPU Ada Generation, 995Gi RAM, and AMD Ryzen Threadripper PRO 5995WX 64-Cores CPU.
Software Dependencies	No	The paper mentions using the 'faiss [28, 17] library' and the 'Tab PFN repository,' but it does not specify version numbers for these or other software dependencies.
Experiment Setup	Yes	For Lo Cal PFN experiments, we adopt the Adam W [32] optimizer with a learning rate of 0.01 and weight decay of 0.01. We do not have warmup or a learning rate scheduler. For the approximate local context for training, we use the same number of neighbours as Tab PFN-k NN. We use a fixed number of query points (1,000) sampled from the training set and a batch of 2. For our reported results, we also use one-hot encoding for neighbour retrieval and inference. In addition, we evaluate our model every 30 gradient steps and apply early stopping based on the validation set AUC for each fold respectively.