A Communication-Efficient Parallel Algorithm for Decision Tree

Authors: Qi Meng, Guolin Ke, Taifeng Wang, Wei Chen, Qiwei Ye, Zhi-Ming Ma, Tie-Yan Liu

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments on real-world datasets show that PV-Tree significantly outperforms the existing parallel decision tree algorithms in the trade-off between accuracy and efficiency.
Researcher Affiliation Collaboration 1Peking University 2Microsoft Research 3Chinese Academy of Mathematics and Systems Science
Pseudocode Yes Algorithm 1 Bulid Tree, Algorithm 2 Find Best Split, Algorithm 3 PV-Tree_Find Best Split
Open Source Code Yes Furthermore, we will open-source PV-Tree algorithm to benefit more researchers and practitioners.
Open Datasets Yes We used two data sets, one for learning to rank (LTR) and the other for ad click prediction (CTR)8 (see Table 1 for details). ... 8We use private data in LTR experiments and data of KDD Cup 2012 track 2 in CTR experiments.
Dataset Splits No The paper mentions training and testing data sizes in Table 1, but does not provide details on validation splits or methodologies (e.g., k-fold cross-validation, specific percentages for validation sets).
Hardware Specification Yes Our experimental environment is a cluster of servers (each with 12 CPU cores and 32 GB RAM) inter-connected with 1 Gbps Ethernet.
Software Dependencies No The paper describes the algorithms implemented for comparison (e.g., attribute-parallel, data-parallel) and mentions general machine learning frameworks like GBDT, but it does not specify any particular software libraries or their version numbers used in the implementation or for experiments.
Experiment Setup Yes For the experiments on LTR, we used 8 machines for parallel training; and for the experiments on CTR, we used 32 machines since the dataset is much larger. ... In PV-Tree, we have a parameter k, which controls the number of top attributes selected during local and global voting... To gain more insights on this, we conducted some experiments, whose results are shown in Table 4, where M refers to the number of machines.