Neural Basis Models for Interpretability
Authors: Filip Radenovic, Abhimanyu Dubey, Dhruv Mahajan
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Experiments |
| Researcher Affiliation | Industry | Source code is available at github.com/facebookresearch/nbm-spam. |
| Pseudocode | No | The paper describes its architecture and methodology using mathematical equations and diagrams (e.g., Figure 1) but does not include any explicitly labeled "Pseudocode" or "Algorithm" blocks. |
| Open Source Code | Yes | Source code is available at github.com/facebookresearch/nbm-spam. |
| Open Datasets | Yes | Tabular datasets. We report performance on CA Housing [10, 46], FICO [22], Cover Type [8, 16, 20], and Newsgroups [32, 43] tabular datasets. ... We also report performance on MIMIC-II [41, 51], Credit [17, 19], Click [15], Epsilon [21], Higgs [3, 26], Microsoft [40, 49], Yahoo [60], and Year [66] tabular datasets. ... Image datasets (classification). We experiment with two bird classification datasets: CUB [18, 62] and i Naturalist Birds [27, 59]. ... Image dataset (object detection). For this task we use a proprietary object detection dataset, denoted as Common Objects |
| Dataset Splits | Yes | Data is split to have 70/10/20 ratio for training, validation, and, testing, respectively; except for Newsgroups where test split is fixed, so we only split the train part to 85/15 ratio for train and validation. ... For these datasets, we follow [12, 47] to use the same training, validation, and, testing splits |
| Hardware Specification | Yes | Linear, NAM, NBM, and MLP models are trained using the Adam with decoupled weight decay (Adam W) optimizer [35], on 8 V100 GPU machines with 32 GB memory, and a batch size of at most 1024 per GPU (divided by 2 every time a batch cannot fit in the memory). ... The throughput is measured as the number of input instances that we can process per second (x / sec) on one 32 GB V100 GPU, in inference mode. ... Finally, for EBMs and XGBoost, CPU machines are used |
| Software Dependencies | No | We implement the following baselines in Py Torch [48]... The paper mentions using PyTorch but does not provide specific version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | We use mean squared error (MSE) for regression, and cross-entropy loss for classification. To avoid overfitting, we use the following regularization techniques: (i) L2-normalization (weight decay) [31] of parameters; (ii) batch-norm [28] and dropout [54] on hidden layers of the basis functions network; (iii) an L2-normalization penalty on the outputs fi to incentivize fewer strong feature contributions, as done in [2]; (iv) basis dropout to randomly drop individual basis functions in order to decorrelate them. ... MLP containing 3 hidden layers with 256, 128, and 128 units, Re LU [23], B = 100 basis outputs for NBMs and B = 200 for NB2Ms. ... Linear, NAM, NBM, and MLP models are trained using the Adam with decoupled weight decay (Adam W) optimizer [35], on 8 V100 GPU machines with 32 GB memory, and a batch size of at most 1024 per GPU (divided by 2 every time a batch cannot fit in the memory). We train for 1,000, 500, 100, or, 50 epochs, depending on the size and feature dimensionality of the dataset. The learning rate is decayed with cosine annealing [34] from the starting value until zero. For NBMs on all datasets, we tune the starting learning rate in the continuous interval [1e 5, 1.0), weight decay in the interval [1e 10, 1.0), output penalty coefficient in the interval [1e 7, 100), dropout and basis dropout coefficients in the discrete set {0, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9}. We find optimal hyper-parameters using validation set and random search. |