SwitchTab: Switched Autoencoders Are Effective Tabular Learners
Authors: Jing Wu, Suiyao Chen, Qi Zhao, Renat Sergazinov, Chen Li, Shengjie Liu, Chongchao Zhao, Tianpei Xie, Hanqing Guo, Cheng Ji, Daniel Cociorva, Hakan Brunzell
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To validate the effectiveness of Switch Tab, we conduct extensive experiments across various domains involving tabular data. The results showcase superior performance in end-to-end prediction tasks with fine-tuning. Moreover, we demonstrate that pre-trained salient embeddings can be utilized as plug-and-play features to enhance the performance of various traditional classification methods (e.g., Logistic Regression, XGBoost, etc.). Lastly, we highlight the capability of Switch Tab to create explainable representations through visualization of decoupled mutual and salient features in the latent space. |
| Researcher Affiliation | Industry | Jing Wu*, Suiyao Chen*, Qi Zhao, Renat Sergazinov, Chen Li, Shengjie Liu, Chongchao Zhao, Tianpei Xie, Hanqing Guo, Cheng Ji, Daniel Cociorva, Hakan Brunzell Amazon Buyer Risk Prevention 4575 La Jolla Village Dr San Diego, California 92122 USA {jingwua, suiyaoc, qqzhao, renserg, chenlii, zycjlsj, zchongch, lukexie, hanqiguo, cjiamzn, cociorva, brunzell}@amazon.com |
| Pseudocode | Yes | Algorithm 1: Self-supervised Learning with Switch Tab |
| Open Source Code | No | The paper does not include any explicit statements about releasing source code or provide a link to a code repository. |
| Open Datasets | Yes | We first evaluate the performance of Switch Tab on a standard benchmark from (Gorishniy et al. 2021). Concretely, the datasets include: California Housing (CA) (Pace and Barry 1997), Adult (AD) (Kohavi et al. 1996), Helena (HE) (Guyon et al. 2019b), Jannis (JA) (Guyon et al. 2019b), Higgs (HI) (Baldi, Sadowski, and Whiteson 2014), ALOI (AL) (Geusebroek, Burghouts, and Smeulders 2005), Epsilon (EP) (Yuan, Ho, and Lin 2011), Year (YE) (Bertin Mahieux et al. 2011), Covertype (CO) (Blackard and Dean 1999), Yahoo (YA) (Chapelle and Chang 2011), Microsoft (MI) (Qin and Liu 2013). Besides the standard benchmarks, there is also another set of popular datasets used by recent work (Somepalli et al. 2021), including Bank (BK) (Moro, Cortez, and Rita 2014), Blastchar (BC) (Ouk, Dada, and Kang 2018), Arrhythmia (AT) (Liu, Ting, and Zhou 2008; Ouk, Dada, and Kang 2018), Arcene (AR) (Asuncion and Newman 2007), Shoppers (SH) (Sakar et 2019), Volkert (VO) (Guyon et al. 2019a) and MNIST (MN) (Xiao, Rasul, and Vollgraf 2017). |
| Dataset Splits | No | The paper mentions using standard benchmarks and fine-tuning results according to established paradigms, but it does not explicitly provide specific training/validation/test split percentages or sample counts for its experiments. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU/CPU models, memory, or cloud instance types). |
| Software Dependencies | No | The paper mentions using the RMSprop and Adam optimizers, but it does not specify any software versions (e.g., programming language versions, specific machine learning libraries, or other dependencies with version numbers). |
| Experiment Setup | Yes | For feature corruption, we uniformly sample a subset of features for each sample to generate a corrupted view at a fixed corruption ratio of 0.3. For the encoder f, we employ a three-layer transformer with two heads. Both projectors ps and pm consist of one linear layer, followed by a sigmoid activation function. Additionally, the decoder d remains a one-layer network with a sigmoid activation function. For all the pre-training, we train all models for 1000 epochs with the default batch size of 128. We use the RMSprop optimizer (Hinton, Srivastava, and Swersky 2012) with an initial learning rate set as 0.0003. During the fine-tuning stage, we set the maximum epochs as 200. Adam optimizer with a learning rate of 0.001 is used. |