TransTab: Learning Transferable Tabular Transformers Across Tables

Authors: Zifeng Wang, Jimeng Sun

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we aim at answering the following questions by extensive experiments: Q1. How does Trans Tab perform compared with baselines under the vanilla supervised setting? Q2. How well does Trans Tab address incremental columns from a stream of data (S(2) in Fig. 1)? Q3. How is the impact of Trans Tab learned from multiple tables (with different columns) drawn from the same domain on its predictive ability (S(1) in Fig. 1)? Q4. Can Trans Tab be a zero-shot learner when pretrained on tables and infer on a new table (S(4) in Fig. 1)? Q5. Is the proposed vertical partition CL better than vanilla supervised pretraining and selfsupervised CL (S(3) in Fig. 1)?
Researcher Affiliation Academia Zifeng Wang1 and Jimeng Sun1,2 1 Department of Computer Science, University of Illinois Urbana-Champaign 2 Carle Illinois College of Medicine, University of Illinois Urbana-Champaign
Pseudocode No The paper describes the method using text and diagrams, but does not include any explicit pseudocode or algorithm blocks.
Open Source Code Yes Our package is available at https://github.com/Ryan Wang Zf/transtab with documentation at https://transtab.readthedocs.io/en/latest/.
Open Datasets Yes We introduce clinical trial mortality prediction datasets where each includes a distinct group of patients and columns 3. The data statistics are in Table 1. Footnote 3 links to: https://data.projectdatasphere.org/projectdatasphere/html/access
Dataset Splits No The paper mentions 'A patience of 10 is kept for supervised training for early stopping' which implies the use of a validation set, but does not explicitly provide the split percentages or methodology for train/validation/test splits for the datasets.
Hardware Specification Yes Experiments were conducted with one RTX3070 GPU, i7-10700 CPU, and 16GB RAM.
Software Dependencies No The paper mentions using 'Adam optimizer' and cites 'Pytorch' in the references, but it does not specify version numbers for these or other software dependencies used in the experiments.
Experiment Setup Yes Trans Tab uses 2 layers of gated transformers where the embedding dimensions of numbers and tokens are 128, and the hidden dimension of intermediate dense layers is 256. The attention module has 8 heads. We choose ReLU activations and do not activate dropout. We train Trans Tab using Adam optimizer [27] with learning rate in {2e-5, 5e-5, 1e-4} and no weight decay; batch size is in {16, 64, 128}. We set a maximum self-supervised pretraining epochs of 50 and supervised training epochs of 100. A patience of 10 is kept for supervised training for early stopping.