Finding Influential Training Samples for Gradient Boosted Decision Trees
Authors: Boris Sharchilev, Yury Ustinovskiy, Pavel Serdyukov, Maarten Rijke
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our approaches on various experimental setups and use-case scenarios and demonstrate both the quality of our approach to finding influential training samples in comparison to the baselines and its computational efficiency. |
| Researcher Affiliation | Collaboration | 1Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands 2Yandex, Moscow, Russia 3Department of Mathematics, Princeton University, Princeton, NJ, USA. |
| Pseudocode | Yes | Algorithm 1 Leaf Refit |
| Open Source Code | Yes | 1Supporting code for the paper is available at https:// github.com/bsharchilev/influence_boosting. |
| Open Datasets | Yes | The datasets used for evaluation are: (1) Adult Data Set (Adult, (dat, 1996)), (2) Amazon Employee Access Challenge dataset (Amazon, (dat, 2013)), (3) the KDD Cup 2009 Upselling dataset (Upselling, (dat, 2009)) and, for the domain mismatch experiment, (4) the Hospital Readmission dataset (Strack et al., 2014). |
| Dataset Splits | No | The paper mentions splitting training points for specific analyses and creating training sets for domain mismatch experiments, but does not provide specific train/validation/test dataset split information (percentages, counts, or explicit standard splits) for general model training or hyperparameter tuning. |
| Hardware Specification | No | The paper does not provide specific hardware details (GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | For our experiments with GBDT, we use Cat Boost(cat, 2018) an open-source implementation of GBDT by Yandex6. (This mentions Cat Boost but not a specific version number, nor other dependencies with versions.) |
| Experiment Setup | No | Dataset statistics and corresponding Cat Boost parameters can be found in the supplementary material. No specific hyperparameters or training configurations are provided directly in the main text. |