Sparse Interaction Additive Networks via Feature Interaction Detection and Sparse Selection

Authors: James Enouen, Yan Liu

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments focus on seven machine learning datasets. Two are in the classification setting, the MIMIC-III Healthcare and the Higgs datasets [20, 3]. The other five are in the regression setting, namely the Appliances Energy, Bike Sharing, California Housing Prices, Wine Quality, and Song Year datasets [6, 14, 22, 12, 4]. More details about each dataset are provided in Table 1. We evaluate the regression datasets using mean-squared error (MSE). We measure the performance on the classification datasets using both the area under the receiver operating characteristic (AUROC) and the area under the precision-recall curve (AUPRC) metrics.
Researcher Affiliation Academia James Enouen Department of Computer Science University of Southern California Los Angeles, CA enouen@usc.edu Yan Liu Department of Computer Science University of Southern California Los Angeles, CA yanliu.cs@usc.edu
Pseudocode Yes Algorithm 1 Feature Interaction Selection (FIS)
Open Source Code Yes 1Available at github.com/Enouen J/sparse-interaction-additive-networks
Open Datasets Yes Our experiments focus on seven machine learning datasets. Two are in the classification setting, the MIMIC-III Healthcare and the Higgs datasets [20, 3]. The other five are in the regression setting, namely the Appliances Energy, Bike Sharing, California Housing Prices, Wine Quality, and Song Year datasets [6, 14, 22, 12, 4].
Dataset Splits Yes All models are evaluated on a held-out test dataset over five folds of training-validation split unless three folds are specified. Three folds are used for NODE-GAM on all datasets as well as Song Year and Higgs for all models. We respect previous testing splits when applicable, otherwise we subdivide the data using an 80-20 split to generate a testing set.
Hardware Specification Yes Experiments are run with a machine using a GTX 1080 GPU and 16GB of RAM.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies used in the experiments (e.g., Python, PyTorch, TensorFlow, scikit-learn versions).
Experiment Setup Yes For the baseline DNNs we are using hidden layer sizes [256,128,64] with Re LU activations. For the GAM subnetworks we are using hidden layer sizes [16,12,8] with Re LU activations. We use L1 regularization of size 5e 5. The hyperparameter τ was taken to be 0.5 throughout and θ was selected from a handful of potential values using a validation set. We train all networks using Adagrad with a learning rate of 5e 3.