Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Interpretable Additive Tabular Transformer Networks

Authors: Anton Frederik Thielmann, Arik Reuter, Thomas Kneib, David Rügamer, Benjamin Säfken

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To validate its efficacy, we conduct experiments on multiple datasets and find that NATT performs on par with state-of-the-art methods on tabular data and surpasses other interpretable approaches. We validate the effectiveness of our model on 8 machine learning benchmark datasets for both classification and regression. We perform 5-fold cross validation on all datasets and report the average performance as well as the standard deviations. For the classification tasks we report the Area under the curve (AUC). For the regression tasks we report the root mean squared error (RMSE).
Researcher Affiliation	Academia	Anton Frederik Thielmann EMAIL Institute of Mathematics Clausthal University of Technology Arik Reuter EMAIL Institute of Mathematics Clausthal University of Technology Thomas Kneib EMAIL Chair of Statistics and Campus Institute Data Science Georg-August-Universität Göttingen David Rügamer EMAIL Department of Statistics, LMU Munich Munich Center for Machine Learning (MCML) Benjamin Säfken EMAIL Institute of Mathematics Clausthal University of Technology
Pseudocode	No	The paper describes the methodology and model architecture using mathematical formulas and diagrams, such as Figure 1 showing the NATT model architecture. However, there are no explicitly labeled 'Pseudocode' or 'Algorithm' sections or code-like formatted procedures.
Open Source Code	No	The paper does not contain an explicit statement about releasing source code for the described methodology, nor does it provide a direct link to a code repository. While supplementary material is mentioned for data and hyperparameters, it does not explicitly state that code is included there.
Open Datasets	Yes	Classification datasets. We report performance on the Adult dataset for predicting a persons income (Kohavi et al., 1996), the Titanic dataset retrieved from Kaggle, for predicting the survival of titanic passengers, the Churn dataset retrieved from Kaggle, covering whether a customer left a bank or not, and the Insurance dataset. Regression datasets. We report performances on another Insurance dataset (Lantz, 2019), 2 Air Bn B datasets with data from the cities of Munich and Amsterdam. Lastly we include the Abalone dataset retrieved from the UCI (Dua and Graff, 2017) as a dataset with only a single categorical variable and 3 categories.
Dataset Splits	Yes	We perform 5-fold cross validation on all datasets and report the average performance as well as the standard deviations.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments. It describes the experimental setup in terms of models, datasets, and hyperparameters, but omits hardware specifications.
Software Dependencies	No	The paper mentions software like XGBoost, LightGBM, and neural network frameworks. For XGBoost, it refers to "the implementation provided by Chen and Guestrin (2016)". However, it does not provide specific version numbers for any of the software dependencies or libraries used, such as Python, PyTorch, or TensorFlow versions.
Experiment Setup	Yes	For network architectures, we orient ourselves on Radenovic et al. (2022) and use single feature nets with [64, 32, 32] neurons, Re LU activation, and 0.1 dropout after each layer. We use the same architecture for all models and employ an embedding size of 64 for NATT as well as 4 transformer blocks. We start with a learning rate of 1 03 and implement learning rate decay with a patience of 15 epochs and early stopping after 25 epochs of no improvement in the validation loss. All results are achieved with the models best-performing weights on the validation dataset.