A Debiased MDI Feature Importance Measure for Random Forests
Authors: Xiao Li, Yu Wang, Sumanta Basu, Karl Kumbier, Bin Yu
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | For both the simulated data and a genomic Ch IP dataset, MDI-oob achieves state-of-the-art performance in feature selection from Random Forests for both deep and shallow trees. |
| Researcher Affiliation | Academia | Xiao Li Statistics Department UC Berkeley sxli@berkeley.edu Yu Wang Statistics Department UC Berkeley wang.yu@berkeley.edu Sumanta Basu Statistics and Data Science Department Computational Biology Department Cornell University sumbose@cornell.edu Karl Kumbier Statistics Department UC Berkeley kkumbier@berkeley.edu Bin Yu EECS, Statistics Department UC Berkeley binyu@berkeley.edu |
| Pseudocode | No | The paper describes procedures and mathematical formulations but does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | The source code is available at https://github.com/shifwang/paper-debiased-feature-importance |
| Open Datasets | Yes | To evaluate our method MDI-oob in a more realistic setting, we consider a Ch IP-chip and Ch IP-seq dataset measuring the enrichment of 80 biomolecules at 3912 regions of the Drosophila genome [5, 18]. |
| Dataset Splits | Yes | Proposition 1 suggests that we can calculate the covariance between yi and f T,k(xi) in Equation (12) using the out-of-bag samples D\D(T ): MDI-oob of feature k = 1 |D\D(T )| i D\D(T ) f T,k(xi) yi. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for experiments, such as CPU or GPU models, or memory specifications. |
| Software Dependencies | No | The paper mentions several software packages (e.g., party, ranger, scikit-learn, XGBoost, treeinterpreter, Cython) but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | While keeping the number of trees to be 300, we vary the minimum leaf size of RF from 1 to 50 and record the MDI of every feature. We grow 100 trees with the minimum leaf size set to either 100 (shallow tree case) or 1 (deep tree case). The number of candidate features mtry is set to be 10. |