Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Debiased MDI Feature Importance Measure for Random Forests

Authors: Xiao Li, Yu Wang, Sumanta Basu, Karl Kumbier, Bin Yu

NeurIPS 2019 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental For both the simulated data and a genomic Ch IP dataset, MDI-oob achieves state-of-the-art performance in feature selection from Random Forests for both deep and shallow trees.
Researcher Affiliation Academia Xiao Li Statistics Department UC Berkeley EMAIL Yu Wang Statistics Department UC Berkeley EMAIL Sumanta Basu Statistics and Data Science Department Computational Biology Department Cornell University EMAIL Karl Kumbier Statistics Department UC Berkeley EMAIL Bin Yu EECS, Statistics Department UC Berkeley EMAIL
Pseudocode No The paper describes procedures and mathematical formulations but does not include any pseudocode or algorithm blocks.
Open Source Code Yes The source code is available at https://github.com/shifwang/paper-debiased-feature-importance
Open Datasets Yes To evaluate our method MDI-oob in a more realistic setting, we consider a Ch IP-chip and Ch IP-seq dataset measuring the enrichment of 80 biomolecules at 3912 regions of the Drosophila genome [5, 18].
Dataset Splits Yes Proposition 1 suggests that we can calculate the covariance between yi and f T,k(xi) in Equation (12) using the out-of-bag samples D\D(T ): MDI-oob of feature k = 1 |D\D(T )| i D\D(T ) f T,k(xi) yi.
Hardware Specification No The paper does not provide specific details about the hardware used for experiments, such as CPU or GPU models, or memory specifications.
Software Dependencies No The paper mentions several software packages (e.g., party, ranger, scikit-learn, XGBoost, treeinterpreter, Cython) but does not provide specific version numbers for these dependencies.
Experiment Setup Yes While keeping the number of trees to be 300, we vary the minimum leaf size of RF from 1 to 50 and record the MDI of every feature. We grow 100 trees with the minimum leaf size set to either 100 (shallow tree case) or 1 (deep tree case). The number of candidate features mtry is set to be 10.