reproducibilityindex.ai

Uncertainty in Gradient Boosting via Ensembles

Authors: Andrey Malinin, Liudmila Prokhorenkova, Aleksei Ustimenko

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conducted experiments on a range of synthetic and real datasets and investigated the applicability of ensemble approaches to gradient boosting models that are themselves ensembles of decision trees. Our analysis shows that ensembles of gradient boosting models successfully detect anomalous inputs while having limited ability to improve the predicted total uncertainty.
Researcher Affiliation	Collaboration	Andrey Malinin Yandex; HSE University Moscow, Russia am969@yandex-team.ru Liudmila Prokhorenkova Yandex; HSE University; Moscow Institute of Physics and Technology Moscow, Russia ostroumova-la@yandex-team.ru Aleksei Ustimenko Yandex Moscow, Russia austimenko@yandex-team.ru
Pseudocode	No	The paper describes algorithms and mathematical formulations but does not present a clearly labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code	Yes	Our methods have been implemented within the open-source Cat Boost library. The code of our experiments is publicly available at https://github.com/yandex-research/GBDT-uncertainty.
Open Datasets	Yes	We compare the algorithms on several classiﬁcation and regression tasks (Gal & Ghahramani, 2016; Prokhorenkova et al., 2018), the description of which is available in Appendix A.3. The datasets are described in Table 3. For regression, we use standard train/validation/test splits (UCI). For classiﬁcation, we split the datasets into proportion 65/15/20 in train, validation, and test sets. For more details, see our Git Hub repository.
Dataset Splits	Yes	For regression, we use standard train/validation/test splits (UCI). For classiﬁcation, we split the datasets into proportion 65/15/20 in train, validation, and test sets.
Hardware Specification	No	The paper mentions training models but does not specify any hardware details such as GPU/CPU models, memory, or cloud instance types used for the experiments.
Software Dependencies	No	The paper mentions using the 'Cat Boost library' and 'scikit-learn implementation' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	Hyper-parameters are tuned by grid search, for details see Appendix A.2. For all approaches, we use grid search to tune learning-rate in {0.001, 0.01, 0.1}, tree depth in {3, 4, 5, 6}. We ﬁx subsample to 0.5 for SGB and to 1 for SGLB.