Cost efficient gradient boosting
Authors: Sven Peter, Ferran Diego, Fred A. Hamprecht, Boaz Nadler
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method on a number of datasets and find that it outperforms the current state of the art by a large margin. Our algorithm is easy to implement and its learning time is comparable to that of the original gradient boosting. |
| Researcher Affiliation | Collaboration | Sven Peter Heidelberg Collaboratory for Image Processing Interdisciplinary Center for Scientific Computing University of Heidelberg 69115 Heidelberg, Germany sven.peter@iwr.uni-heidelberg.de Ferran Diego Robert Bosch Gmb H Robert-Bosch-Straße 200 31139 Hildesheim, Germany ferran.diegoandilla@de.bosch.com Fred A. Hamprecht Heidelberg Collaboratory for Image Processing Interdisciplinary Center for Scientific Computing University of Heidelberg 69115 Heidelberg, Germany fred.hamprecht@iwr.uni-heidelberg.de Boaz Nadler Department of Computer Science Weizmann Institute of Science Rehovot 76100, Israel boaz.nadler@weizmann.ac.il |
| Pseudocode | No | The paper describes the steps of its algorithm in paragraph form, but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Source code is made available at http://github.com/svenpeter42/Light GBM-CEGB. |
| Open Datasets | Yes | The Yahoo! Learning to Rank (Yahoo! LTR) challenge dataset [7] consists of 473134 training, 71083 validation and 165660 test document-query pairs... The Mini Boo NE dataset [27, 21] consists of 45523 training, 19510 validation and 65031 test instances... The Forest Covertype dataset [3, 21] consists of 36603 training, 15688 validation and 58101 test instances... We additionally use the HEPMASS-1000 and HEPMASS-not1000 datasets [2, 21]. |
| Dataset Splits | Yes | The Yahoo! Learning to Rank (Yahoo! LTR) challenge dataset [7] consists of 473134 training, 71083 validation and 165660 test document-query pairs... The Mini Boo NE dataset [27, 21] consists of 45523 training, 19510 validation and 65031 test instances... The Forest Covertype dataset [3, 21] consists of 36603 training, 15688 validation and 58101 test instances... Both datasets contain over ten million instances which we split into 3.5 million training, 1.4 million validation and 5.6 million test instances. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions that its source code is 'based on Light GBM [17]' but does not provide specific version numbers for Light GBM or any other software dependencies. |
| Experiment Setup | Yes | Regression trees with depth four are constructed and assumed to approximately cost as much as features with feature cost βm = 1. We therefore set the split cost α = 1/4 to allow a fair comparison with our trees which will contain deeper branches. We also use our algorithm to construct trees similar to GREEDYMISER by limiting the trees to 16 leaves with a maximum branch depth of four. |