Sparse Interaction Additive Networks via Feature Interaction Detection and Sparse Selection
Authors: James Enouen, Yan Liu
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments focus on seven machine learning datasets. Two are in the classification setting, the MIMIC-III Healthcare and the Higgs datasets [20, 3]. The other five are in the regression setting, namely the Appliances Energy, Bike Sharing, California Housing Prices, Wine Quality, and Song Year datasets [6, 14, 22, 12, 4]. More details about each dataset are provided in Table 1. We evaluate the regression datasets using mean-squared error (MSE). We measure the performance on the classification datasets using both the area under the receiver operating characteristic (AUROC) and the area under the precision-recall curve (AUPRC) metrics. |
| Researcher Affiliation | Academia | James Enouen Department of Computer Science University of Southern California Los Angeles, CA enouen@usc.edu Yan Liu Department of Computer Science University of Southern California Los Angeles, CA yanliu.cs@usc.edu |
| Pseudocode | Yes | Algorithm 1 Feature Interaction Selection (FIS) |
| Open Source Code | Yes | 1Available at github.com/Enouen J/sparse-interaction-additive-networks |
| Open Datasets | Yes | Our experiments focus on seven machine learning datasets. Two are in the classification setting, the MIMIC-III Healthcare and the Higgs datasets [20, 3]. The other five are in the regression setting, namely the Appliances Energy, Bike Sharing, California Housing Prices, Wine Quality, and Song Year datasets [6, 14, 22, 12, 4]. |
| Dataset Splits | Yes | All models are evaluated on a held-out test dataset over five folds of training-validation split unless three folds are specified. Three folds are used for NODE-GAM on all datasets as well as Song Year and Higgs for all models. We respect previous testing splits when applicable, otherwise we subdivide the data using an 80-20 split to generate a testing set. |
| Hardware Specification | Yes | Experiments are run with a machine using a GTX 1080 GPU and 16GB of RAM. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies used in the experiments (e.g., Python, PyTorch, TensorFlow, scikit-learn versions). |
| Experiment Setup | Yes | For the baseline DNNs we are using hidden layer sizes [256,128,64] with Re LU activations. For the GAM subnetworks we are using hidden layer sizes [16,12,8] with Re LU activations. We use L1 regularization of size 5e 5. The hyperparameter τ was taken to be 0.5 throughout and θ was selected from a handful of potential values using a validation set. We train all networks using Adagrad with a learning rate of 5e 3. |