Transfer Learning with Deep Tabular Models

Authors: Roman Levin, Valeriia Cherepanova, Avi Schwarzschild, Arpit Bansal, C. Bayan Bruss, Tom Goldstein, Andrew Gordon Wilson, Micah Goldblum

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work, we explore the benefits that representation learning provides for knowledge transfer in the tabular domain. We conduct experiments in a realistic medical diagnosis test bed with limited amounts of downstream data and find that transfer learning with deep tabular models provides a definitive advantage over gradient boosted decision tree methods. We further compare the supervised and self-supervised pre-training strategies and provide practical advice on transfer learning with tabular models.
Researcher Affiliation Collaboration Roman Levin 1 Valeriia Cherepanova 2 Avi Schwarzschild 2 Arpit Bansal 2 C. Bayan Bruss3 Tom Goldstein2 Andrew Gordon Wilson4 Micah Goldblum4 1University of Washington 2University of Maryland 3Capital One 4New York University
Pseudocode No The paper describes procedures and methods in paragraph form (e.g., the pseudo-feature method in Section 6), but it does not include any clearly labeled "Pseudocode" or "Algorithm" blocks, nor does it present structured steps formatted like code.
Open Source Code Yes We include the code for reproducing our results in the supplementary materials.
Open Datasets Yes We thus construct a suite of transfer learning benchmarks using the Meta MIMIC repository Grzyb et al. (2021); Wo znica et al. (2022) which is based on the MIMIC-IV Johnson et al. (2021); Goldberger et al. (2000) clinical database of anonymized patient data from the the Beth Israel Deaconess Medical Center ICU admissions.
Dataset Splits Yes We reserve 6985 patients (20% of the Meta MIMIC data) for the downstream test set, and use 22701 patients (65% of the Meta MIMIC data) for training and 5239 patients (15% of the Meta MIMIC data) as a validation set for hyperparameter tuning of the upstream feature extractors.
Hardware Specification Yes We ran our experiments on NVIDIA Ge Force RTX 2080 Ti machines.
Software Dependencies No The paper lists software names such as "RTDL", "Tab Transformer", "Catboost", and "XGBoost" and their licenses, but it does not provide specific version numbers for these software components (e.g., Catboost 0.25).
Experiment Setup Yes All deep models are trained with Adam W optimizer (Loshchilov and Hutter, 2017). We pre-train models on upstream data for 500 epochs with patience set to 30... We use learning rate 1e 4 for training from scratch on downstream data and learning rate 5e 5 for fine-tuning pre-trained models. For pre-training, learning rate and weight decay are tunable hyperparameters for each model... Batch size was set to 256 in all transfer-learning experiments. Also, Appendix C details hyperparameter search spaces and distributions.