reproducibilityindex.ai

MonoForest framework for tree ensemble analysis

Authors: Igor Kuralenok, Vasilii Ershov, Igor Labutin

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our experiments, it shows comparable results with state-of-the-art interpretation techniques. Another application of the framework is the ensemble-wise pruning: we can drop monomials from the polynomial, based on train data statistics. This way we reduce the model size up to 3 times without loss of its quality. For our experiment we have used model built by Light GBM for Higgs dataset, transferred it to an ensemble of symmetric trees by Algorithm 1 and then compared model execution times of the original model and transformed version. Obtained ROC AUC values are presented in the second (original model) and third (pruned model) columns of the Table 1. The experimental results allow us to claim that the tree ensemble can be signiﬁcantly reduced without loss of model quality in a variety of practical tasks. To demonstrate the quality of the proposed approach we built a decision tree ensemble binary one-vs-rest classiﬁer for each class of the MNIST dataset using Cat Boost and analyzed each classiﬁer using three methods: Mono Forest, SHAP, and permutation based Model Reliance, proposed by Fisher et al. (8).
Researcher Affiliation	Collaboration	Igor Kuralenok Yandex / Jet Brains Research solar@yandex-team.ru Vasily Ershov Yandex noxoomo@yandex-team.ru Igor Labutin Yandex / SPb HSE Labutin.Igor L@gmail.com
Pseudocode	Yes	Algorithm 1: Greedy ensemble composition algorithm.
Open Source Code	No	No explicit statement about releasing source code or a link to a code repository for the described methodology is provided in the paper.
Open Datasets	No	The paper mentions 'Higgs dataset' and 'MNIST dataset' and 'publicly-available binary classiﬁcation datasets' but does not provide a specific link, DOI, repository name, or a formal citation with author and year to access these datasets.
Dataset Splits	No	The paper mentions 'we split the data into train/validate/test groups' and 'parameters were tuned on train/validation pair' but does not provide specific percentages, sample counts, or a citation to predefined splits for the validation set.
Hardware Specification	Yes	For experiment we have used dual-socket server with Intel Xeon CPU E5-2650 and 256GB of RAM.
Software Dependencies	Yes	Cat Boost version was equal to 0.14.2
Experiment Setup	No	The paper states, 'It important to note that we tuned the optimal gradient step, number of trees in the ensemble, regularization parameters of a single trees, etc.' but does not provide the specific values for these hyperparameters or detailed system-level training settings in the main text.