Minimal Variance Sampling in Stochastic Gradient Boosting
Authors: Bulat Ibragimov, Gleb Gusev
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Here we provide experimental results of MVS algorithm on two popular open-source implementations of gradient boosting: Cat Boost and Light GBM. The results show significant improvement over the existing default: 97 wins of MVS versus 55 wins of default setting and +0.12% mean ROC-AUC improvement. |
| Researcher Affiliation | Collaboration | Bulat Ibragimov Yandex, Moscow, Russia Moscow Institute of Physics and Technology ibrbulat@yandex.ru Gleb Gusev Sberbank , Moscow, Russia gusev.g.g@sberbank.ru |
| Pseudocode | Yes | Algorithm 1 MVS Algorithm; Algorithm 2 Calculate Threshold |
| Open Source Code | Yes | The source code of MVS is publicly available [6] and ready to be used as a default option of Cat Boost algorithm. ... The MVS source code for Light GBM may be found at [19]. |
| Open Datasets | Yes | Table 1: Datasets description; KDD Internet [1], Adult [25], Amazon [23], KDD Upselling [11], Kick prediction [22], KDD Churn [10], Click prediction [12]. All the datasets are publicly available and were preprocessed according to [5]. |
| Dataset Splits | Yes | For tuning sampling parameters of each algorithm (sample rate and λ coefficient for MVS, large gradients fraction and small gradients fraction for GOSS, sample rate for SGB), we use 5-fold cross-validation on train subset of the data. |
| Hardware Specification | No | The paper does not specify the exact hardware (e.g., CPU, GPU models, memory) used for running the experiments, only mentioning the use of Cat Boost and Light GBM libraries. |
| Software Dependencies | No | The paper mentions software like Cat Boost and Light GBM but does not provide specific version numbers for these or other relevant software dependencies, such as programming languages or libraries. |
| Experiment Setup | Yes | We implemented MVS in Cat Boost and performed benchmark comparison of MVS with sampling ratio 80% and default Cat Boost with no sampling... The algorithms were compared by the ROC-AUC metric... For tuning sampling parameters of each algorithm (sample rate and λ coefficient for MVS, large gradients fraction and small gradients fraction for GOSS, sample rate for SGB), we use 5-fold cross-validation on train subset of the data... The evaluation part is run 10 times with different seeds. The final result is defined as the mean over these 10 runs. |