reproducibilityindex.ai

Finding Influential Training Samples for Gradient Boosted Decision Trees

Authors: Boris Sharchilev, Yury Ustinovskiy, Pavel Serdyukov, Maarten Rijke

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our approaches on various experimental setups and use-case scenarios and demonstrate both the quality of our approach to ﬁnding inﬂuential training samples in comparison to the baselines and its computational efﬁciency.
Researcher Affiliation	Collaboration	1Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands 2Yandex, Moscow, Russia 3Department of Mathematics, Princeton University, Princeton, NJ, USA.
Pseudocode	Yes	Algorithm 1 Leaf Reﬁt
Open Source Code	Yes	1Supporting code for the paper is available at https:// github.com/bsharchilev/influence_boosting.
Open Datasets	Yes	The datasets used for evaluation are: (1) Adult Data Set (Adult, (dat, 1996)), (2) Amazon Employee Access Challenge dataset (Amazon, (dat, 2013)), (3) the KDD Cup 2009 Upselling dataset (Upselling, (dat, 2009)) and, for the domain mismatch experiment, (4) the Hospital Readmission dataset (Strack et al., 2014).
Dataset Splits	No	The paper mentions splitting training points for specific analyses and creating training sets for domain mismatch experiments, but does not provide specific train/validation/test dataset split information (percentages, counts, or explicit standard splits) for general model training or hyperparameter tuning.
Hardware Specification	No	The paper does not provide specific hardware details (GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	For our experiments with GBDT, we use Cat Boost(cat, 2018) an open-source implementation of GBDT by Yandex6. (This mentions Cat Boost but not a specific version number, nor other dependencies with versions.)
Experiment Setup	No	Dataset statistics and corresponding Cat Boost parameters can be found in the supplementary material. No specific hyperparameters or training configurations are provided directly in the main text.