TabNet: Attentive Interpretable Tabular Learning
Authors: Sercan Ö. Arik, Tomas Pfister6679-6687
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that Tab Net outperforms other variants on a wide range of non-performance-saturated tabular datasets and yields interpretable feature attributions plus insights into its global behavior. We study Tab Net in wide range of problems, that contain regression or classification tasks, particularly with published benchmarks. For all datasets, categorical inputs are mapped to a single-dimensional trainable scalar with a learnable embedding and numerical columns are input without and preprocessing. Table 1 shows that Tab Net outperforms others (Tree Ensembles (Geurts, Ernst, and Wehenkel 2006), LASSO regularization, L2X (Chen et al. 2018)) and is on par with INVASE (Yoon, Jordon, and van der Schaar 2019). |
| Researcher Affiliation | Industry | Sercan O. Arık, Tomas Pfister Google Cloud AI Sunnyvale, CA soarik@google.com, tpfister@google.com |
| Pseudocode | No | The paper describes the architecture and various components (e.g., feature transformer, attentive transformer) with mathematical formulations and diagrams (Figure 4), but it does not include any explicit pseudocode blocks or algorithms labeled as such. |
| Open Source Code | No | An open-source implementation will be released. |
| Open Datasets | Yes | We study Tab Net in wide range of problems, that contain regression or classification tasks, particularly with published benchmarks. For all datasets, categorical inputs are mapped to a single-dimensional trainable scalar with a learnable embedding and numerical columns are input without and preprocessing. We consider 6 tabular datasets from (Chen et al. 2018) (consisting 10k training samples). Forest Cover Type (Dua and Graff 2017). Poker Hand (Dua and Graff 2017). Sarcos (Vijayakumar and Schaal 2000). Higgs Boson (Dua and Graff 2017). Rossmann Store Sales (https://www.kaggle.com/c/rossmann-store-sales). UCI Machine Learning Repository. URL http://archive.ics.uci.edu/ml. Accessed: 2019-1110. (Dua and Graff 2017) |
| Dataset Splits | No | The paper states, 'For all experiments we cite, we use the same training, validation and testing data split with the original work.' and 'Hyperparameters of the Tab Net models are optimized on a validation set and listed in Appendix.'. However, it does not explicitly provide the specific percentages or counts for the training, validation, and testing splits within the main text for all datasets. For synthetic datasets, it only mentions '10k training samples'. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for running the experiments. It only generally discusses the use of 'gradient descent-based optimization' and 'faster training, we use large batch sizes with BN'. |
| Software Dependencies | No | The paper mentions 'Adam optimization algorithm (Kingma and Ba 2014)' and 'Glorot uniform initialization' but does not specify software dependencies with version numbers (e.g., Python version, specific deep learning frameworks like PyTorch or TensorFlow versions, CUDA versions, or other libraries). |
| Experiment Setup | Yes | Hyperparameters of the Tab Net models are optimized on a validation set and listed in Appendix. Adam optimization algorithm (Kingma and Ba 2014) and Glorot uniform initialization are used for training of all models. We use standard classification (softmax cross entropy) and regression (mean squared error) loss functions and we train until convergence. For faster training, we use large batch sizes with BN. Thus, except the one applied to the input features, we use ghost BN (Hoffer, Hubara, and Soudry 2017) form, using a virtual batch size BV and momentum m B. To further control the sparsity of the selected features, we propose sparsity regularization in the form of entropy... with a coefficient λsparse. |