reproducibilityindex.ai

A Debiased MDI Feature Importance Measure for Random Forests

Authors: Xiao Li, Yu Wang, Sumanta Basu, Karl Kumbier, Bin Yu

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	For both the simulated data and a genomic Ch IP dataset, MDI-oob achieves state-of-the-art performance in feature selection from Random Forests for both deep and shallow trees.
Researcher Affiliation	Academia	Xiao Li Statistics Department UC Berkeley sxli@berkeley.edu Yu Wang Statistics Department UC Berkeley wang.yu@berkeley.edu Sumanta Basu Statistics and Data Science Department Computational Biology Department Cornell University sumbose@cornell.edu Karl Kumbier Statistics Department UC Berkeley kkumbier@berkeley.edu Bin Yu EECS, Statistics Department UC Berkeley binyu@berkeley.edu
Pseudocode	No	The paper describes procedures and mathematical formulations but does not include any pseudocode or algorithm blocks.
Open Source Code	Yes	The source code is available at https://github.com/shifwang/paper-debiased-feature-importance
Open Datasets	Yes	To evaluate our method MDI-oob in a more realistic setting, we consider a Ch IP-chip and Ch IP-seq dataset measuring the enrichment of 80 biomolecules at 3912 regions of the Drosophila genome [5, 18].
Dataset Splits	Yes	Proposition 1 suggests that we can calculate the covariance between yi and f T,k(xi) in Equation (12) using the out-of-bag samples D\D(T ): MDI-oob of feature k = 1 \|D\D(T )\| i D\D(T ) f T,k(xi) yi.
Hardware Specification	No	The paper does not provide specific details about the hardware used for experiments, such as CPU or GPU models, or memory specifications.
Software Dependencies	No	The paper mentions several software packages (e.g., party, ranger, scikit-learn, XGBoost, treeinterpreter, Cython) but does not provide specific version numbers for these dependencies.
Experiment Setup	Yes	While keeping the number of trees to be 300, we vary the minimum leaf size of RF from 1 to 50 and record the MDI of every feature. We grow 100 trees with the minimum leaf size set to either 100 (shallow tree case) or 1 (deep tree case). The number of candidate features mtry is set to be 10.