InterpreTabNet: Distilling Predictive Signals from Tabular Data by Salient Feature Interpretation
Authors: Jacob Yoke Hong Si, Wendy Yusi Cheng, Michael Cooper, Rahul Krishnan
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through comprehensive experiments on real-world datasets, we demonstrate that Interpre Tab Net outperforms previous methods for interpreting tabular data while attaining competitive accuracy. |
| Researcher Affiliation | Academia | 1University of Toronto 2Vector Institute. Correspondence to: Jacob Si <jacobyhsi@cs.toronto.edu>. |
| Pseudocode | Yes | Algorithm 1 Our proposed algorithm for interpretability optimization. Good default settings for the tested machine learning problems are α = 0, β = [0, 10000000], δ = [0.20, 0.25], γ = [2, 3] ϵ = [3, 5]. For β, δ and γ, it would depend on the nature of the dataset. More samples require higher parameter values. |
| Open Source Code | Yes | The code is available on Git Hub at: https://github.com/jacobyhsi/InterpreTabNet |
| Open Datasets | Yes | The real-world tabular datasets we use in our experiments are from the UCI Machine Learning Repository (Kelly et al., 2023) and Open ML (Vanschoren et al., 2013). ... Table 6: Dataset Links |
| Dataset Splits | Yes | The training/validation/testing proportion of the datasets for each split is 80/10/10% apart from the Higgs dataset. Due to the inherently large Higgs dataset, we adhere to Tab Net's method of data splitting with 500k training samples, 100k validation samples, and 100k testing samples. |
| Hardware Specification | No | The paper discusses computational efficiency and training time (e.g., 'several-minute increase in training time') but does not specify any hardware components like GPU models, CPU types, or other specific computational resources used for the experiments. |
| Software Dependencies | Yes | The GPT-4 version used in our experiments is 'gpt-4-1106' with training data up to Apr 2023 and a context window of 128,000 tokens. |
| Experiment Setup | Yes | Hyperparameters such as Nd = Na, Nsteps, γ, and learning rate are tuned in the range per Tab Net's recommendations. In terms of the sparsity regularizer for Interpre Tab Net, r M, we recommend a smaller range e.g. [0,10000] for datasets with a low to moderate number of features and samples (Adult dataset), and a larger range e.g. [0, 1,000,000,000,000] for datasets with a larger number of features and samples (Higgs dataset). Table 7: Hyperparameter spaces for all models |