TransTab: Learning Transferable Tabular Transformers Across Tables
Authors: Zifeng Wang, Jimeng Sun
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we aim at answering the following questions by extensive experiments: Q1. How does Trans Tab perform compared with baselines under the vanilla supervised setting? Q2. How well does Trans Tab address incremental columns from a stream of data (S(2) in Fig. 1)? Q3. How is the impact of Trans Tab learned from multiple tables (with different columns) drawn from the same domain on its predictive ability (S(1) in Fig. 1)? Q4. Can Trans Tab be a zero-shot learner when pretrained on tables and infer on a new table (S(4) in Fig. 1)? Q5. Is the proposed vertical partition CL better than vanilla supervised pretraining and selfsupervised CL (S(3) in Fig. 1)? |
| Researcher Affiliation | Academia | Zifeng Wang1 and Jimeng Sun1,2 1 Department of Computer Science, University of Illinois Urbana-Champaign 2 Carle Illinois College of Medicine, University of Illinois Urbana-Champaign |
| Pseudocode | No | The paper describes the method using text and diagrams, but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our package is available at https://github.com/Ryan Wang Zf/transtab with documentation at https://transtab.readthedocs.io/en/latest/. |
| Open Datasets | Yes | We introduce clinical trial mortality prediction datasets where each includes a distinct group of patients and columns 3. The data statistics are in Table 1. Footnote 3 links to: https://data.projectdatasphere.org/projectdatasphere/html/access |
| Dataset Splits | No | The paper mentions 'A patience of 10 is kept for supervised training for early stopping' which implies the use of a validation set, but does not explicitly provide the split percentages or methodology for train/validation/test splits for the datasets. |
| Hardware Specification | Yes | Experiments were conducted with one RTX3070 GPU, i7-10700 CPU, and 16GB RAM. |
| Software Dependencies | No | The paper mentions using 'Adam optimizer' and cites 'Pytorch' in the references, but it does not specify version numbers for these or other software dependencies used in the experiments. |
| Experiment Setup | Yes | Trans Tab uses 2 layers of gated transformers where the embedding dimensions of numbers and tokens are 128, and the hidden dimension of intermediate dense layers is 256. The attention module has 8 heads. We choose ReLU activations and do not activate dropout. We train Trans Tab using Adam optimizer [27] with learning rate in {2e-5, 5e-5, 1e-4} and no weight decay; batch size is in {16, 64, 128}. We set a maximum self-supervised pretraining epochs of 50 and supervised training epochs of 100. A patience of 10 is kept for supervised training for early stopping. |