A Debiased MDI Feature Importance Measure for Random Forests

Authors: Xiao Li, Yu Wang, Sumanta Basu, Karl Kumbier, Bin Yu

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental For both the simulated data and a genomic Ch IP dataset, MDI-oob achieves state-of-the-art performance in feature selection from Random Forests for both deep and shallow trees.
Researcher Affiliation Academia Xiao Li Statistics Department UC Berkeley sxli@berkeley.edu Yu Wang Statistics Department UC Berkeley wang.yu@berkeley.edu Sumanta Basu Statistics and Data Science Department Computational Biology Department Cornell University sumbose@cornell.edu Karl Kumbier Statistics Department UC Berkeley kkumbier@berkeley.edu Bin Yu EECS, Statistics Department UC Berkeley binyu@berkeley.edu
Pseudocode No The paper describes procedures and mathematical formulations but does not include any pseudocode or algorithm blocks.
Open Source Code Yes The source code is available at https://github.com/shifwang/paper-debiased-feature-importance
Open Datasets Yes To evaluate our method MDI-oob in a more realistic setting, we consider a Ch IP-chip and Ch IP-seq dataset measuring the enrichment of 80 biomolecules at 3912 regions of the Drosophila genome [5, 18].
Dataset Splits Yes Proposition 1 suggests that we can calculate the covariance between yi and f T,k(xi) in Equation (12) using the out-of-bag samples D\D(T ): MDI-oob of feature k = 1 |D\D(T )| i D\D(T ) f T,k(xi) yi.
Hardware Specification No The paper does not provide specific details about the hardware used for experiments, such as CPU or GPU models, or memory specifications.
Software Dependencies No The paper mentions several software packages (e.g., party, ranger, scikit-learn, XGBoost, treeinterpreter, Cython) but does not provide specific version numbers for these dependencies.
Experiment Setup Yes While keeping the number of trees to be 300, we vary the minimum leaf size of RF from 1 to 50 and record the MDI of every feature. We grow 100 trees with the minimum leaf size set to either 100 (shallow tree case) or 1 (deep tree case). The number of candidate features mtry is set to be 10.