Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Minimal Variance Sampling in Stochastic Gradient Boosting
Authors: Bulat Ibragimov, Gleb Gusev
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Here we provide experimental results of MVS algorithm on two popular open-source implementations of gradient boosting: Cat Boost and Light GBM. The results show significant improvement over the existing default: 97 wins of MVS versus 55 wins of default setting and +0.12% mean ROC-AUC improvement. |
| Researcher Affiliation | Collaboration | Bulat Ibragimov Yandex, Moscow, Russia Moscow Institute of Physics and Technology EMAIL Gleb Gusev Sberbank , Moscow, Russia EMAIL |
| Pseudocode | Yes | Algorithm 1 MVS Algorithm; Algorithm 2 Calculate Threshold |
| Open Source Code | Yes | The source code of MVS is publicly available [6] and ready to be used as a default option of Cat Boost algorithm. ... The MVS source code for Light GBM may be found at [19]. |
| Open Datasets | Yes | Table 1: Datasets description; KDD Internet [1], Adult [25], Amazon [23], KDD Upselling [11], Kick prediction [22], KDD Churn [10], Click prediction [12]. All the datasets are publicly available and were preprocessed according to [5]. |
| Dataset Splits | Yes | For tuning sampling parameters of each algorithm (sample rate and λ coefficient for MVS, large gradients fraction and small gradients fraction for GOSS, sample rate for SGB), we use 5-fold cross-validation on train subset of the data. |
| Hardware Specification | No | The paper does not specify the exact hardware (e.g., CPU, GPU models, memory) used for running the experiments, only mentioning the use of Cat Boost and Light GBM libraries. |
| Software Dependencies | No | The paper mentions software like Cat Boost and Light GBM but does not provide specific version numbers for these or other relevant software dependencies, such as programming languages or libraries. |
| Experiment Setup | Yes | We implemented MVS in Cat Boost and performed benchmark comparison of MVS with sampling ratio 80% and default Cat Boost with no sampling... The algorithms were compared by the ROC-AUC metric... For tuning sampling parameters of each algorithm (sample rate and λ coefficient for MVS, large gradients fraction and small gradients fraction for GOSS, sample rate for SGB), we use 5-fold cross-validation on train subset of the data... The evaluation part is run 10 times with different seeds. The final result is defined as the mean over these 10 runs. |