Improving Neural Additive Models with Bayesian Principles
Authors: Kouroche Bouchiat, Alexander Immer, Hugo Yèche, Gunnar Ratsch, Vincent Fortuin
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | we show improved empirical performance on tabular datasets and challenging real-world medical tasks. |
| Researcher Affiliation | Collaboration | 1ETH Z urich, Z urich, Switzerland 2Max Planck Institute for Intelligent Systems, T ubingen, Germany 3Helmholtz AI, Munich, Germany 4TU Munich, Munich, Germany 5Munich Center for Machine Learning, Munich, Germany. |
| Pseudocode | No | The paper describes mathematical and procedural steps but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code: github.com/fortuinlab/LA-NAM |
| Open Datasets | Yes | We evaluate the proposed LA-NAM on a collection of synthetic and real-world datasets emphasizing its potential for supporting decision-making in the medical field...We benchmark the LA-NAM and baselines on the standard selection of UCI regression and binary classification datasets...We utilize the MIMIC-III patient database (Johnson et al., 2016) and employ the preprocessing outlined by Lengerich et al. (2022). Additionally, we leverage the Hi RID database (Faltys et al., 2024) and adopt the pre-processing proposed by Y eche et al. (2021). |
| Dataset Splits | Yes | Each dataset is split into five cross-validation folds and the mean and standard error of model performance are reported across folds. We split off 12.5% of the training data as validation data for the NAM. |
| Hardware Specification | Yes | The deep learning models are trained on a single NVIDIA RTX2080Ti with a Xeon E5-2630v4 core. Other models are trained on Xeon E5-2697v4 cores and Xeon Gold 6140 cores. |
| Software Dependencies | No | The paper mentions several software components like scikit-learn, pygam, Adam, Light GBM, and Interpret ML, but it does not specify their version numbers. |
| Experiment Setup | Yes | The LA-NAM is constructed using feature networks containing a single hidden layer of 64 neurons with GELU activation (Hendrycks & Gimpel, 2023)...We select the learning rate in the discrete set of {0.1, 0.01, 0.001} which maximizes the ultimate log-marginal likelihood. We use a batch size of 512 and perform early stopping on the log-marginal likelihood restoring the best scoring parameters and hyperparameters at the end of training. We find that the algorithm is fairly robust to the choice of hyperparameter optimization schedule: For all experiments, we use 0.1 for the hyperparameter learning rate and perform batches of 30 gradient steps on the log-marginal likelihood every 100 epochs of regular training. |