Interpretability with full complexity by constraining feature information

Authors: Kieran A Murphy, Danielle Bassett

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We develop a framework for extracting insight from the spectrum of approximate models and demonstrate its utility on a range of tabular datasets. Through experiments on a variety of tabular datasets and comparisons to other approaches in interpretable ML and feature selection, we demonstrate the following strengths of the approach:
Researcher Affiliation Academia Kieran A. Murphy1 and Dani S. Bassett1,2,3,4,5,6,7 1Dept. of Bioengineering, School of Engineering & Applied Science, U. of Pennsylvania, Philadelphia, PA 19104, USA 2Dept. of Electrical & Systems Engineering, School of Engineering & Applied Science, U. of Pennsylvania, Philadelphia, PA 19104, USA 3Dept. of Neurology, Perelman School of Medicine, U. of Pennsylvania, Philadelphia, PA 19104, USA 4Dept. of Psychiatry, Perelman School of Medicine, U. of Pennsylvania, Philadelphia, PA 19104, USA 5Dept. of Physics & Astronomy, College of Arts & Sciences, U. of Pennsylvania, Philadelphia, PA 19104, USA 6The Santa Fe Institute, Santa Fe, NM 87501, USA
Pseudocode No The paper describes the methods using prose and mathematical equations but does not include any pseudocode or algorithm blocks.
Open Source Code Yes Code and additional examples may be found through the project page, distributed-informationbottleneck.github.io.
Open Datasets Yes We analyzed a variety of tabular datasets with the Distributed IB... Bikeshare (Dua & Graff, 2017)... Mice Protein (Higuera et al., 2015)... MIMIC-II (Johnson et al., 2016)... We used the dataset preprocessing code released with NODE-GAM (Chang et al., 2022) on Github1 with a minor modification to use one-hot encoding2 for categorical variables instead of leave-one-out encoding3 when input to the Distributed IB models, unless there were more than 100 categories. 1https://github.com/zzzace2000/nodegam
Dataset Splits Yes For each dataset we tuned the following hyperparameters with a grid search: L2-regularization weight γ {0, 0.01, 0.1, 1} and the dropout fraction in {0, 0.1, 0.3, 0.5}. We selected the parameters with the best integrated performance on the validation set (metric number of features), and used the same hidden dimensions as the joint encoder for the Distributed IB method (2 layers of 256 units).
Hardware Specification Yes All Distributed IB experiments were run on a single computer with a 12 GB Ge Force RTX 3060 GPU; 100k steps on the Bikeshare dataset takes a couple minutes, and on Microsoft about 2 hours.
Software Dependencies No The paper mentions software like 'tensorflow.keras', 'Pytorch', and 'Tensorflow' but does not provide specific version numbers for any of these dependencies.
Experiment Setup Yes Training hyperparameters and architecture details are shown in Tab. 2 (common to all datasets) and Tab. 3 (dataset-specific). Table 2 includes: Positional encoding frequencies [1, 2, 4, 8], Nonlinear activation Leaky ReLU (α = 0.2), Feature encoder MLP [128, 128], Bottleneck embedding space dimension 8, Joint encoder MLP [256, 256], Batch size 128, Optimizer Adam, Learning rate 3e-4, βinitial 2e-5. Table 3 includes Annealing steps and Dropout fraction for various datasets.