reproducibilityindex.ai

Cost efficient gradient boosting

Authors: Sven Peter, Ferran Diego, Fred A. Hamprecht, Boaz Nadler

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method on a number of datasets and ﬁnd that it outperforms the current state of the art by a large margin. Our algorithm is easy to implement and its learning time is comparable to that of the original gradient boosting.
Researcher Affiliation	Collaboration	Sven Peter Heidelberg Collaboratory for Image Processing Interdisciplinary Center for Scientiﬁc Computing University of Heidelberg 69115 Heidelberg, Germany sven.peter@iwr.uni-heidelberg.de Ferran Diego Robert Bosch Gmb H Robert-Bosch-Straße 200 31139 Hildesheim, Germany ferran.diegoandilla@de.bosch.com Fred A. Hamprecht Heidelberg Collaboratory for Image Processing Interdisciplinary Center for Scientiﬁc Computing University of Heidelberg 69115 Heidelberg, Germany fred.hamprecht@iwr.uni-heidelberg.de Boaz Nadler Department of Computer Science Weizmann Institute of Science Rehovot 76100, Israel boaz.nadler@weizmann.ac.il
Pseudocode	No	The paper describes the steps of its algorithm in paragraph form, but does not provide structured pseudocode or algorithm blocks.
Open Source Code	Yes	Source code is made available at http://github.com/svenpeter42/Light GBM-CEGB.
Open Datasets	Yes	The Yahoo! Learning to Rank (Yahoo! LTR) challenge dataset [7] consists of 473134 training, 71083 validation and 165660 test document-query pairs... The Mini Boo NE dataset [27, 21] consists of 45523 training, 19510 validation and 65031 test instances... The Forest Covertype dataset [3, 21] consists of 36603 training, 15688 validation and 58101 test instances... We additionally use the HEPMASS-1000 and HEPMASS-not1000 datasets [2, 21].
Dataset Splits	Yes	The Yahoo! Learning to Rank (Yahoo! LTR) challenge dataset [7] consists of 473134 training, 71083 validation and 165660 test document-query pairs... The Mini Boo NE dataset [27, 21] consists of 45523 training, 19510 validation and 65031 test instances... The Forest Covertype dataset [3, 21] consists of 36603 training, 15688 validation and 58101 test instances... Both datasets contain over ten million instances which we split into 3.5 million training, 1.4 million validation and 5.6 million test instances.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run experiments, such as GPU or CPU models.
Software Dependencies	No	The paper mentions that its source code is 'based on Light GBM [17]' but does not provide specific version numbers for Light GBM or any other software dependencies.
Experiment Setup	Yes	Regression trees with depth four are constructed and assumed to approximately cost as much as features with feature cost βm = 1. We therefore set the split cost α = 1/4 to allow a fair comparison with our trees which will contain deeper branches. We also use our algorithm to construct trees similar to GREEDYMISER by limiting the trees to 16 leaves with a maximum branch depth of four.