Gradient Boosted Decision Trees for High Dimensional Sparse Output

Authors: Si Si, Huan Zhang, S. Sathiya Keerthi, Dhruv Mahajan, Inderjit S. Dhillon, Cho-Jui Hsieh

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we apply our algorithm to extreme multilabel classification problems, and show that the proposed GBDT-SPARSE achieves an order of magnitude improvements in model size and prediction time over existing methods, while yielding similar performance.
Researcher Affiliation Collaboration 1Google Research, Mountain View, USA 2University of California at Davis, Davis, USA 3Microsoft, Mountain View, USA 4Facebook, Menlo Park, USA 5University of Texas at Austin, Austin, USA.
Pseudocode Yes Algorithm 1: GBDT-SPARSE tree node splitting algorithm
Open Source Code No The paper refers to a link for Light GBM, a baseline method ('https://github.com/Microsoft/LightGBM'), but does not provide a link or statement about the availability of their own GBDT-SPARSE code.
Open Datasets Yes Data: We conducted experiments on 5 standard and publicly available multi-label learning datasets.3 Table 2 shows the associated details. NUS-WIDE is available at http://lms.comp.nus.edu.sg/ research/NUS-WIDE.htm. All other datasets are available at http://manikvarma.org/downloads/XC/XMLRepository.html.
Dataset Splits No The paper mentions 'training samples' and 'testing samples' in Table 2, but does not explicitly define a 'validation' dataset split or describe the methodology for creating such a split.
Hardware Specification Yes All experiments are conducted on a machine with an Intel Xeon X5440 2.83GHz CPU and 32GB RAM. For PD-Sparse we use a similar machine with 192GB memory due to its large memory footprint. We run our algorithm with Delicious-200K on a 28-core dual socket E5-2683v3 machine
Software Dependencies No The paper mentions various baselines like XGBoost, Light GBM, LEML, FASTXML, SLEEC, and PD-SPARSE, stating they used 'their highly optimized C++ implementation published along with the original papers,' but does not provide specific version numbers for any software, including their own.
Experiment Setup Yes For our method, we kept most of the parameters fixed for all the datasets: hmax = 10, nleaf = 100, and, λ = 5, where hmax and nleaf are the maximum level of the tree and the minimal number of data points in each leaf. Leaf node sparsity k was set to 100 for Delicious-200K and 20 for all others. This parameter can be very intuitively set as an increasing function of label set size. We hand tuned the projection dimensionality d and set it to 100 for Delicious and Wiki10-31K, and 50 for others.