Neural Additive Models: Interpretable Machine Learning with Neural Nets

Authors: Rishabh Agarwal, Levi Melnick, Nicholas Frosst, Xuezhou Zhang, Ben Lengerich, Rich Caruana, Geoffrey E. Hinton

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments on regression and classification datasets show that NAMs are more accurate than widely used intelligible models such as logistic regression and shallow decision trees.
Researcher Affiliation Collaboration Rishabh Agarwal Google Research, Brain Team Levi Melnick Microsoft Research Nicholas Frosst Cohere Xuezhou Zhang University of Wisconsin-Madison Ben Lengerich MIT Rich Caruana Microsoft Research Geoffrey E. Hinton Google Research, Brain Team
Pseudocode No The paper includes architectural diagrams (e.g., Figure 1, Figure 7a, Figure 8) to illustrate the model structure, but it does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps in a code-like format.
Open Source Code Yes Source code is available at neural-additive-models.github.io.
Open Datasets Yes We report results on two widely used regression datasets, namely California Housing [27] for predicting housing prices and FICO [9] for understanding credit score predictions, as well as two classification datasets, namely Credit [7] for financial fraud detection and MIMICII [38] for predicting mortality in ICUs. In 2016, Pro Publica released recidivism data [30] on defendants in Broward County, Florida.
Dataset Splits Yes We perform 5-fold cross validation to evaluate the accuracy of the learned models. Means and standard deviations are reported from 5-fold cross validation.
Hardware Specification No The paper mentions that 'inference and training can be done on GPUs/TPUs or other specialized hardware', but it does not provide specific details such as exact GPU/CPU models, memory amounts, or types of cloud resources used for their experiments within the main text or the appendix.
Software Dependencies No The paper mentions using 'sklearn implementation [28]' and 'XGBoost implementation [6]' for baselines, and the 'Adam optimizer [20]' for training, but it does not specify the version numbers for these software dependencies, which is required for reproducibility.
Experiment Setup Yes Training and Evaluation. Feature nets in NAMs are selected amongst (1) DNNs containing 3 hidden layers with 64, 64 and 32 units and Re LU activation, and (2) single hidden layer NNs with 1024 Ex U units and Re LU-1 activation. We perform 5-fold cross validation to evaluate the accuracy of the learned models. More details about training and evaluation protocols can be found in Section A.5 in the appendix. (from A.5): All models are trained for 100 epochs using the Adam optimizer [20] with a learning rate of 0.001. ... The feature nets use 3 hidden layers with 64, 64 and 32 units respectively for Re LU units and 1 hidden layer with 1024 units for Ex U units. ... For regularization, we use weight decay of 0.0001, output regularization of 0.0001, and dropout [41] of 0.2.