Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Interpretable Additive Tabular Transformer Networks
Authors: Anton Frederik Thielmann, Arik Reuter, Thomas Kneib, David Rügamer, Benjamin Säfken
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To validate its efficacy, we conduct experiments on multiple datasets and find that NATT performs on par with state-of-the-art methods on tabular data and surpasses other interpretable approaches. We validate the effectiveness of our model on 8 machine learning benchmark datasets for both classification and regression. We perform 5-fold cross validation on all datasets and report the average performance as well as the standard deviations. For the classification tasks we report the Area under the curve (AUC). For the regression tasks we report the root mean squared error (RMSE). |
| Researcher Affiliation | Academia | Anton Frederik Thielmann EMAIL Institute of Mathematics Clausthal University of Technology Arik Reuter EMAIL Institute of Mathematics Clausthal University of Technology Thomas Kneib EMAIL Chair of Statistics and Campus Institute Data Science Georg-August-Universität Göttingen David Rügamer EMAIL Department of Statistics, LMU Munich Munich Center for Machine Learning (MCML) Benjamin Säfken EMAIL Institute of Mathematics Clausthal University of Technology |
| Pseudocode | No | The paper describes the methodology and model architecture using mathematical formulas and diagrams, such as Figure 1 showing the NATT model architecture. However, there are no explicitly labeled 'Pseudocode' or 'Algorithm' sections or code-like formatted procedures. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing source code for the described methodology, nor does it provide a direct link to a code repository. While supplementary material is mentioned for data and hyperparameters, it does not explicitly state that code is included there. |
| Open Datasets | Yes | Classification datasets. We report performance on the Adult dataset for predicting a persons income (Kohavi et al., 1996), the Titanic dataset retrieved from Kaggle, for predicting the survival of titanic passengers, the Churn dataset retrieved from Kaggle, covering whether a customer left a bank or not, and the Insurance dataset. Regression datasets. We report performances on another Insurance dataset (Lantz, 2019), 2 Air Bn B datasets with data from the cities of Munich and Amsterdam. Lastly we include the Abalone dataset retrieved from the UCI (Dua and Graff, 2017) as a dataset with only a single categorical variable and 3 categories. |
| Dataset Splits | Yes | We perform 5-fold cross validation on all datasets and report the average performance as well as the standard deviations. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments. It describes the experimental setup in terms of models, datasets, and hyperparameters, but omits hardware specifications. |
| Software Dependencies | No | The paper mentions software like XGBoost, LightGBM, and neural network frameworks. For XGBoost, it refers to "the implementation provided by Chen and Guestrin (2016)". However, it does not provide specific version numbers for any of the software dependencies or libraries used, such as Python, PyTorch, or TensorFlow versions. |
| Experiment Setup | Yes | For network architectures, we orient ourselves on Radenovic et al. (2022) and use single feature nets with [64, 32, 32] neurons, Re LU activation, and 0.1 dropout after each layer. We use the same architecture for all models and employ an embedding size of 64 for NATT as well as 4 transformer blocks. We start with a learning rate of 1 03 and implement learning rate decay with a patience of 15 epochs and early stopping after 25 epochs of no improvement in the validation loss. All results are achieved with the models best-performing weights on the validation dataset. |