Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Quantized Training of Gradient Boosting Decision Trees
Authors: Yu Shi, Guolin Ke, Zhuoming Chen, Shuxin Zheng, Tie-Yan Liu
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Surprisingly, both our theoretical analysis and empirical studies show that the necessary precisions of gradients without hurting any performance can be quite low, e.g., 2 or 3 bits. Benchmarked on CPUs, GPUs, and distributed clusters, we observe up to 2 speedup of our simple quantization strategy compared with SOTA GBDT systems on extensive datasets, demonstrating the effectiveness and potential of the low-precision training of GBDT. |
| Researcher Affiliation | Collaboration | 1Microsoft Research 2DP Technology 3Tsinghua University |
| Pseudocode | Yes | Algorithm 1 Histogram Construction for Leaf s |
| Open Source Code | No | The code will be released to the official repository of Light GBM.4 |
| Open Datasets | Yes | Table 1: Datasets used in experiments. Name #Train #Test #Attribute Task Metric and footnotes like 5https://www.csie.ntu.edu.tw/ cjlin/libsvmtools/datasets/binary.html#epsilon, 6https://go.criteo.net/criteo-research-kaggle-display-advertising-challenge-dataset.tar.gz, 7https://www.kaggle.com/c/bosch-production-line-performance, 8https://webscope.sandbox.yahoo.com/catalog.php?datatype=c along with citations like Higgs [2], Kitsune [21], Year [3], LETOR [25]. |
| Dataset Splits | No | The paper mentions 'test set' but does not provide specific details about a validation set or how data was split into training, validation, and test subsets for reproducibility. |
| Hardware Specification | Yes | No-packing version on GPU requires atomic addition for 16-bit integers, which is not natively supported by NVIDIA V100 GPUs. |
| Software Dependencies | No | The paper mentions software like Light GBM, XGBoost, Cat Boost, and CUDA, but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | A full description of datasets and hyperparameter settings is provided in Appendix C. |