Covered Information Disentanglement: Model Transparency via Unbiased Permutation Importance

Authors: João P. B. Pereira, Erik S. G. Stroes, Aeilko H. Zwinderman, Evgeni Levin7984-7992

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate its efficacy in adjusting permutation importance first on a controlled toy dataset and discuss its effect on real-world medical data. ... To test the CID ranking adjustment, we first tested it on a toy dataset where the real importances are known, and a real-world medical dataset.
Researcher Affiliation Collaboration 1Amsterdam University Medical Center, Meibergdreef 9 1105 AZ, Amsterdam, The Netherlands 2 Horaizon, Marshallaan 2 2625 GZ, Delft, The Netherlands
Pseudocode Yes Algorithm 1: CID Importance
Open Source Code Yes We make an implementation of CID publicly available at: https://github.com/JBPereira/CID.
Open Datasets Yes We demonstrate its efficacy in adjusting permutation importance first on a controlled toy dataset and discuss its effect on real-world medical data. ... Cardiovascular Risk Prediction dataset (Hoogeveen et al. 2020) ... EPIC-Norfolk study (Day et al. 1999).
Dataset Splits Yes We performed 100 shuffle splits with Extremely Randomized Trees... We performed 100 shuffle splits and measured the mean square error on the test set. We used 5-fold cross-validation to select the optimal hyper-parameters of a Survival Gradient Boosting regressor (P olsterl 2020).
Hardware Specification Yes We also report the average running time per cycle conducted on an 8-core Intel(R) Core(TM) i7-7700HQ CPU @ 2.81Ghz.
Software Dependencies No The paper mentions software like "Python", "scikit-learn", and "scikit-survival", but it does not specify version numbers for these software components. For example, "We implemented CID in Python using scikit-learn s graphical lasso (Pedregosa and et al. 2011)." and "...we used a Gradient Boosting Survival model (P olsterl 2020) [scikit-survival]." It lacks explicit version numbers for reproducibility.
Experiment Setup Yes To test the CID correction, we performed 200 Shuffle Splits with Extremely Randomized Trees and computed the Gini importance for each feature, as well as the permutation importance(PI). We then adjusted the feature importances using the CID algorithm and Bayesian Regression as ϕ (see assumption 1). ... For the cardiovascular event survival analysis, we discretized the data into 10 bins. For this experiment we used: ei(f, s) = ϕf Hc+ Xi (s), Hc Xi (s), H + i (s), H i (s) = Ii(f, s)g Hc+ Xi (s) 1 Hc+ Xi (s) g Hc+ Xi (s) = ( c, if Hc+ Xi (s) > 0, c [1, + [ 1, otherwise , that is, the permutation importance is modelled as the true importance weighted by the fraction of uncovered information (disregarding synergy) scaled by c. We then found c using grid-search on the values: 1/c = [1.2, 1.4, 1.6, 1.8, 2, 2.2, 2.4]. We removed data instances that contained values exceeding 4 times the standard deviation to achieve better discretization.