Neural Basis Models for Interpretability

Authors: Filip Radenovic, Abhimanyu Dubey, Dhruv Mahajan

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Experiments
Researcher Affiliation Industry Source code is available at github.com/facebookresearch/nbm-spam.
Pseudocode No The paper describes its architecture and methodology using mathematical equations and diagrams (e.g., Figure 1) but does not include any explicitly labeled "Pseudocode" or "Algorithm" blocks.
Open Source Code Yes Source code is available at github.com/facebookresearch/nbm-spam.
Open Datasets Yes Tabular datasets. We report performance on CA Housing [10, 46], FICO [22], Cover Type [8, 16, 20], and Newsgroups [32, 43] tabular datasets. ... We also report performance on MIMIC-II [41, 51], Credit [17, 19], Click [15], Epsilon [21], Higgs [3, 26], Microsoft [40, 49], Yahoo [60], and Year [66] tabular datasets. ... Image datasets (classification). We experiment with two bird classification datasets: CUB [18, 62] and i Naturalist Birds [27, 59]. ... Image dataset (object detection). For this task we use a proprietary object detection dataset, denoted as Common Objects
Dataset Splits Yes Data is split to have 70/10/20 ratio for training, validation, and, testing, respectively; except for Newsgroups where test split is fixed, so we only split the train part to 85/15 ratio for train and validation. ... For these datasets, we follow [12, 47] to use the same training, validation, and, testing splits
Hardware Specification Yes Linear, NAM, NBM, and MLP models are trained using the Adam with decoupled weight decay (Adam W) optimizer [35], on 8 V100 GPU machines with 32 GB memory, and a batch size of at most 1024 per GPU (divided by 2 every time a batch cannot fit in the memory). ... The throughput is measured as the number of input instances that we can process per second (x / sec) on one 32 GB V100 GPU, in inference mode. ... Finally, for EBMs and XGBoost, CPU machines are used
Software Dependencies No We implement the following baselines in Py Torch [48]... The paper mentions using PyTorch but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup Yes We use mean squared error (MSE) for regression, and cross-entropy loss for classification. To avoid overfitting, we use the following regularization techniques: (i) L2-normalization (weight decay) [31] of parameters; (ii) batch-norm [28] and dropout [54] on hidden layers of the basis functions network; (iii) an L2-normalization penalty on the outputs fi to incentivize fewer strong feature contributions, as done in [2]; (iv) basis dropout to randomly drop individual basis functions in order to decorrelate them. ... MLP containing 3 hidden layers with 256, 128, and 128 units, Re LU [23], B = 100 basis outputs for NBMs and B = 200 for NB2Ms. ... Linear, NAM, NBM, and MLP models are trained using the Adam with decoupled weight decay (Adam W) optimizer [35], on 8 V100 GPU machines with 32 GB memory, and a batch size of at most 1024 per GPU (divided by 2 every time a batch cannot fit in the memory). We train for 1,000, 500, 100, or, 50 epochs, depending on the size and feature dimensionality of the dataset. The learning rate is decayed with cosine annealing [34] from the starting value until zero. For NBMs on all datasets, we tune the starting learning rate in the continuous interval [1e 5, 1.0), weight decay in the interval [1e 10, 1.0), output penalty coefficient in the interval [1e 7, 100), dropout and basis dropout coefficients in the discrete set {0, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9}. We find optimal hyper-parameters using validation set and random search.