Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Interpretable Additive Tabular Transformer Networks

Authors: Anton Frederik Thielmann, Arik Reuter, Thomas Kneib, David Rügamer, Benjamin Säfken

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To validate its efficacy, we conduct experiments on multiple datasets and find that NATT performs on par with state-of-the-art methods on tabular data and surpasses other interpretable approaches. We validate the effectiveness of our model on 8 machine learning benchmark datasets for both classification and regression. We perform 5-fold cross validation on all datasets and report the average performance as well as the standard deviations. For the classification tasks we report the Area under the curve (AUC). For the regression tasks we report the root mean squared error (RMSE).
Researcher Affiliation Academia Anton Frederik Thielmann EMAIL Institute of Mathematics Clausthal University of Technology Arik Reuter EMAIL Institute of Mathematics Clausthal University of Technology Thomas Kneib EMAIL Chair of Statistics and Campus Institute Data Science Georg-August-Universität Göttingen David Rügamer EMAIL Department of Statistics, LMU Munich Munich Center for Machine Learning (MCML) Benjamin Säfken EMAIL Institute of Mathematics Clausthal University of Technology
Pseudocode No The paper describes the methodology and model architecture using mathematical formulas and diagrams, such as Figure 1 showing the NATT model architecture. However, there are no explicitly labeled 'Pseudocode' or 'Algorithm' sections or code-like formatted procedures.
Open Source Code No The paper does not contain an explicit statement about releasing source code for the described methodology, nor does it provide a direct link to a code repository. While supplementary material is mentioned for data and hyperparameters, it does not explicitly state that code is included there.
Open Datasets Yes Classification datasets. We report performance on the Adult dataset for predicting a persons income (Kohavi et al., 1996), the Titanic dataset retrieved from Kaggle, for predicting the survival of titanic passengers, the Churn dataset retrieved from Kaggle, covering whether a customer left a bank or not, and the Insurance dataset. Regression datasets. We report performances on another Insurance dataset (Lantz, 2019), 2 Air Bn B datasets with data from the cities of Munich and Amsterdam. Lastly we include the Abalone dataset retrieved from the UCI (Dua and Graff, 2017) as a dataset with only a single categorical variable and 3 categories.
Dataset Splits Yes We perform 5-fold cross validation on all datasets and report the average performance as well as the standard deviations.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments. It describes the experimental setup in terms of models, datasets, and hyperparameters, but omits hardware specifications.
Software Dependencies No The paper mentions software like XGBoost, LightGBM, and neural network frameworks. For XGBoost, it refers to "the implementation provided by Chen and Guestrin (2016)". However, it does not provide specific version numbers for any of the software dependencies or libraries used, such as Python, PyTorch, or TensorFlow versions.
Experiment Setup Yes For network architectures, we orient ourselves on Radenovic et al. (2022) and use single feature nets with [64, 32, 32] neurons, Re LU activation, and 0.1 dropout after each layer. We use the same architecture for all models and employ an embedding size of 64 for NATT as well as 4 transformer blocks. We start with a learning rate of 1 03 and implement learning rate decay with a patience of 15 epochs and early stopping after 25 epochs of no improvement in the validation loss. All results are achieved with the models best-performing weights on the validation dataset.