Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
A Debiased MDI Feature Importance Measure for Random Forests
Authors: Xiao Li, Yu Wang, Sumanta Basu, Karl Kumbier, Bin Yu
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | For both the simulated data and a genomic Ch IP dataset, MDI-oob achieves state-of-the-art performance in feature selection from Random Forests for both deep and shallow trees. |
| Researcher Affiliation | Academia | Xiao Li Statistics Department UC Berkeley EMAIL Yu Wang Statistics Department UC Berkeley EMAIL Sumanta Basu Statistics and Data Science Department Computational Biology Department Cornell University EMAIL Karl Kumbier Statistics Department UC Berkeley EMAIL Bin Yu EECS, Statistics Department UC Berkeley EMAIL |
| Pseudocode | No | The paper describes procedures and mathematical formulations but does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | The source code is available at https://github.com/shifwang/paper-debiased-feature-importance |
| Open Datasets | Yes | To evaluate our method MDI-oob in a more realistic setting, we consider a Ch IP-chip and Ch IP-seq dataset measuring the enrichment of 80 biomolecules at 3912 regions of the Drosophila genome [5, 18]. |
| Dataset Splits | Yes | Proposition 1 suggests that we can calculate the covariance between yi and f T,k(xi) in Equation (12) using the out-of-bag samples D\D(T ): MDI-oob of feature k = 1 |D\D(T )| i D\D(T ) f T,k(xi) yi. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for experiments, such as CPU or GPU models, or memory specifications. |
| Software Dependencies | No | The paper mentions several software packages (e.g., party, ranger, scikit-learn, XGBoost, treeinterpreter, Cython) but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | While keeping the number of trees to be 300, we vary the minimum leaf size of RF from 1 to 50 and record the MDI of every feature. We grow 100 trees with the minimum leaf size set to either 100 (shallow tree case) or 1 (deep tree case). The number of candidate features mtry is set to be 10. |