Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Sparse Interaction Additive Networks via Feature Interaction Detection and Sparse Selection
Authors: James Enouen, Yan Liu
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments focus on seven machine learning datasets. Two are in the classification setting, the MIMIC-III Healthcare and the Higgs datasets [20, 3]. The other five are in the regression setting, namely the Appliances Energy, Bike Sharing, California Housing Prices, Wine Quality, and Song Year datasets [6, 14, 22, 12, 4]. More details about each dataset are provided in Table 1. We evaluate the regression datasets using mean-squared error (MSE). We measure the performance on the classification datasets using both the area under the receiver operating characteristic (AUROC) and the area under the precision-recall curve (AUPRC) metrics. |
| Researcher Affiliation | Academia | James Enouen Department of Computer Science University of Southern California Los Angeles, CA EMAIL Yan Liu Department of Computer Science University of Southern California Los Angeles, CA EMAIL |
| Pseudocode | Yes | Algorithm 1 Feature Interaction Selection (FIS) |
| Open Source Code | Yes | 1Available at github.com/Enouen J/sparse-interaction-additive-networks |
| Open Datasets | Yes | Our experiments focus on seven machine learning datasets. Two are in the classification setting, the MIMIC-III Healthcare and the Higgs datasets [20, 3]. The other five are in the regression setting, namely the Appliances Energy, Bike Sharing, California Housing Prices, Wine Quality, and Song Year datasets [6, 14, 22, 12, 4]. |
| Dataset Splits | Yes | All models are evaluated on a held-out test dataset over five folds of training-validation split unless three folds are specified. Three folds are used for NODE-GAM on all datasets as well as Song Year and Higgs for all models. We respect previous testing splits when applicable, otherwise we subdivide the data using an 80-20 split to generate a testing set. |
| Hardware Specification | Yes | Experiments are run with a machine using a GTX 1080 GPU and 16GB of RAM. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies used in the experiments (e.g., Python, PyTorch, TensorFlow, scikit-learn versions). |
| Experiment Setup | Yes | For the baseline DNNs we are using hidden layer sizes [256,128,64] with Re LU activations. For the GAM subnetworks we are using hidden layer sizes [16,12,8] with Re LU activations. We use L1 regularization of size 5e 5. The hyperparameter τ was taken to be 0.5 throughout and θ was selected from a handful of potential values using a validation set. We train all networks using Adagrad with a learning rate of 5e 3. |