reproducibilityindex.ai

Minimal Variance Sampling in Stochastic Gradient Boosting

Authors: Bulat Ibragimov, Gleb Gusev

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Here we provide experimental results of MVS algorithm on two popular open-source implementations of gradient boosting: Cat Boost and Light GBM. The results show signiﬁcant improvement over the existing default: 97 wins of MVS versus 55 wins of default setting and +0.12% mean ROC-AUC improvement.
Researcher Affiliation	Collaboration	Bulat Ibragimov Yandex, Moscow, Russia Moscow Institute of Physics and Technology ibrbulat@yandex.ru Gleb Gusev Sberbank , Moscow, Russia gusev.g.g@sberbank.ru
Pseudocode	Yes	Algorithm 1 MVS Algorithm; Algorithm 2 Calculate Threshold
Open Source Code	Yes	The source code of MVS is publicly available [6] and ready to be used as a default option of Cat Boost algorithm. ... The MVS source code for Light GBM may be found at [19].
Open Datasets	Yes	Table 1: Datasets description; KDD Internet [1], Adult [25], Amazon [23], KDD Upselling [11], Kick prediction [22], KDD Churn [10], Click prediction [12]. All the datasets are publicly available and were preprocessed according to [5].
Dataset Splits	Yes	For tuning sampling parameters of each algorithm (sample rate and λ coefﬁcient for MVS, large gradients fraction and small gradients fraction for GOSS, sample rate for SGB), we use 5-fold cross-validation on train subset of the data.
Hardware Specification	No	The paper does not specify the exact hardware (e.g., CPU, GPU models, memory) used for running the experiments, only mentioning the use of Cat Boost and Light GBM libraries.
Software Dependencies	No	The paper mentions software like Cat Boost and Light GBM but does not provide specific version numbers for these or other relevant software dependencies, such as programming languages or libraries.
Experiment Setup	Yes	We implemented MVS in Cat Boost and performed benchmark comparison of MVS with sampling ratio 80% and default Cat Boost with no sampling... The algorithms were compared by the ROC-AUC metric... For tuning sampling parameters of each algorithm (sample rate and λ coefﬁcient for MVS, large gradients fraction and small gradients fraction for GOSS, sample rate for SGB), we use 5-fold cross-validation on train subset of the data... The evaluation part is run 10 times with different seeds. The ﬁnal result is deﬁned as the mean over these 10 runs.