Gradient Boosted Decision Trees for High Dimensional Sparse Output
Authors: Si Si, Huan Zhang, S. Sathiya Keerthi, Dhruv Mahajan, Inderjit S. Dhillon, Cho-Jui Hsieh
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we apply our algorithm to extreme multilabel classification problems, and show that the proposed GBDT-SPARSE achieves an order of magnitude improvements in model size and prediction time over existing methods, while yielding similar performance. |
| Researcher Affiliation | Collaboration | 1Google Research, Mountain View, USA 2University of California at Davis, Davis, USA 3Microsoft, Mountain View, USA 4Facebook, Menlo Park, USA 5University of Texas at Austin, Austin, USA. |
| Pseudocode | Yes | Algorithm 1: GBDT-SPARSE tree node splitting algorithm |
| Open Source Code | No | The paper refers to a link for Light GBM, a baseline method ('https://github.com/Microsoft/LightGBM'), but does not provide a link or statement about the availability of their own GBDT-SPARSE code. |
| Open Datasets | Yes | Data: We conducted experiments on 5 standard and publicly available multi-label learning datasets.3 Table 2 shows the associated details. NUS-WIDE is available at http://lms.comp.nus.edu.sg/ research/NUS-WIDE.htm. All other datasets are available at http://manikvarma.org/downloads/XC/XMLRepository.html. |
| Dataset Splits | No | The paper mentions 'training samples' and 'testing samples' in Table 2, but does not explicitly define a 'validation' dataset split or describe the methodology for creating such a split. |
| Hardware Specification | Yes | All experiments are conducted on a machine with an Intel Xeon X5440 2.83GHz CPU and 32GB RAM. For PD-Sparse we use a similar machine with 192GB memory due to its large memory footprint. We run our algorithm with Delicious-200K on a 28-core dual socket E5-2683v3 machine |
| Software Dependencies | No | The paper mentions various baselines like XGBoost, Light GBM, LEML, FASTXML, SLEEC, and PD-SPARSE, stating they used 'their highly optimized C++ implementation published along with the original papers,' but does not provide specific version numbers for any software, including their own. |
| Experiment Setup | Yes | For our method, we kept most of the parameters fixed for all the datasets: hmax = 10, nleaf = 100, and, λ = 5, where hmax and nleaf are the maximum level of the tree and the minimal number of data points in each leaf. Leaf node sparsity k was set to 100 for Delicious-200K and 20 for all others. This parameter can be very intuitively set as an increasing function of label set size. We hand tuned the projection dimensionality d and set it to 100 for Delicious and Wiki10-31K, and 50 for others. |