Interpretable Mesomorphic Networks for Tabular Data

Authors: Arlind Kadra, Sebastian Pineda Arango, Josif Grabocka

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive experiments, we demonstrate that our explainable deep networks have comparable performance to state-of-the-art classifiers on tabular data and outperform current existing methods that are explainable by design.
Researcher Affiliation Academia Arlind Kadra Department of Representation Learning University of Freiburg kadraa@cs.uni-freiburg.de Sebastian Pineda Arango Department of Representation Learning University of Freiburg pineda@cs.uni-freiburg.de Josif Grabocka Department of Machine Learning University of Technology Nuremberg josif.grabocka@utn.de
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes We make our implementation publicly available4. (footnote 4: Source code at https://github.com/Arlind Kadra/IMN)
Open Datasets Yes We run our predictive accuracy experiments on the Auto ML benchmark that includes 35 diverse classification problems... For more details about the datasets included in our experiments, we point the reader to Appendix C. (Appendix C includes Table 6: Statistics regarding the Auto ML benchmark datasets, with Dataset IDs which can be accessed from Open ML)
Dataset Splits Yes All datasets have a train/validation set ratio of 10 to 1.
Hardware Specification Yes Lastly, the methods that offer GPU support are run on a single NVIDIA RTX2080Ti, while, the rest of the methods are run on an AMD EPYC 7502 32-core processor.
Software Dependencies No The paper mentions using PyTorch as the main library, scikit-learn, Optuna, and specific implementations for CatBoost, TabNet, and NAM. However, it does not provide specific version numbers for these software dependencies (e.g., 'PyTorch 1.x' or 'scikit-learn 0.y').
Experiment Setup Yes For the default hyperparameters of our method, we use 2 residual blocks and 128 units per layer combined with the GELU activation (Hendrycks & Gimpel, 2016). When training our network, we use snapshot ensembling (Huang et al., 2017) combined with cosine annealing with restarts (Loshchilov & Hutter, 2019). We use a learning rate and weight decay value of 0.01, where, the learning rate is warmed up to 0.01 for the first 5 epochs, a dropout value of 0.25, and an L1 penalty of 0.1 on the weights. Our network is trained for 500 epochs with a batch size of 64.