Minimal Variance Sampling in Stochastic Gradient Boosting

Authors: Bulat Ibragimov, Gleb Gusev

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Here we provide experimental results of MVS algorithm on two popular open-source implementations of gradient boosting: Cat Boost and Light GBM. The results show significant improvement over the existing default: 97 wins of MVS versus 55 wins of default setting and +0.12% mean ROC-AUC improvement.
Researcher Affiliation Collaboration Bulat Ibragimov Yandex, Moscow, Russia Moscow Institute of Physics and Technology ibrbulat@yandex.ru Gleb Gusev Sberbank , Moscow, Russia gusev.g.g@sberbank.ru
Pseudocode Yes Algorithm 1 MVS Algorithm; Algorithm 2 Calculate Threshold
Open Source Code Yes The source code of MVS is publicly available [6] and ready to be used as a default option of Cat Boost algorithm. ... The MVS source code for Light GBM may be found at [19].
Open Datasets Yes Table 1: Datasets description; KDD Internet [1], Adult [25], Amazon [23], KDD Upselling [11], Kick prediction [22], KDD Churn [10], Click prediction [12]. All the datasets are publicly available and were preprocessed according to [5].
Dataset Splits Yes For tuning sampling parameters of each algorithm (sample rate and λ coefficient for MVS, large gradients fraction and small gradients fraction for GOSS, sample rate for SGB), we use 5-fold cross-validation on train subset of the data.
Hardware Specification No The paper does not specify the exact hardware (e.g., CPU, GPU models, memory) used for running the experiments, only mentioning the use of Cat Boost and Light GBM libraries.
Software Dependencies No The paper mentions software like Cat Boost and Light GBM but does not provide specific version numbers for these or other relevant software dependencies, such as programming languages or libraries.
Experiment Setup Yes We implemented MVS in Cat Boost and performed benchmark comparison of MVS with sampling ratio 80% and default Cat Boost with no sampling... The algorithms were compared by the ROC-AUC metric... For tuning sampling parameters of each algorithm (sample rate and λ coefficient for MVS, large gradients fraction and small gradients fraction for GOSS, sample rate for SGB), we use 5-fold cross-validation on train subset of the data... The evaluation part is run 10 times with different seeds. The final result is defined as the mean over these 10 runs.