A Communication-Efficient Parallel Algorithm for Decision Tree
Authors: Qi Meng, Guolin Ke, Taifeng Wang, Wei Chen, Qiwei Ye, Zhi-Ming Ma, Tie-Yan Liu
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on real-world datasets show that PV-Tree significantly outperforms the existing parallel decision tree algorithms in the trade-off between accuracy and efficiency. |
| Researcher Affiliation | Collaboration | 1Peking University 2Microsoft Research 3Chinese Academy of Mathematics and Systems Science |
| Pseudocode | Yes | Algorithm 1 Bulid Tree, Algorithm 2 Find Best Split, Algorithm 3 PV-Tree_Find Best Split |
| Open Source Code | Yes | Furthermore, we will open-source PV-Tree algorithm to benefit more researchers and practitioners. |
| Open Datasets | Yes | We used two data sets, one for learning to rank (LTR) and the other for ad click prediction (CTR)8 (see Table 1 for details). ... 8We use private data in LTR experiments and data of KDD Cup 2012 track 2 in CTR experiments. |
| Dataset Splits | No | The paper mentions training and testing data sizes in Table 1, but does not provide details on validation splits or methodologies (e.g., k-fold cross-validation, specific percentages for validation sets). |
| Hardware Specification | Yes | Our experimental environment is a cluster of servers (each with 12 CPU cores and 32 GB RAM) inter-connected with 1 Gbps Ethernet. |
| Software Dependencies | No | The paper describes the algorithms implemented for comparison (e.g., attribute-parallel, data-parallel) and mentions general machine learning frameworks like GBDT, but it does not specify any particular software libraries or their version numbers used in the implementation or for experiments. |
| Experiment Setup | Yes | For the experiments on LTR, we used 8 machines for parallel training; and for the experiments on CTR, we used 32 machines since the dataset is much larger. ... In PV-Tree, we have a parameter k, which controls the number of top attributes selected during local and global voting... To gain more insights on this, we conducted some experiments, whose results are shown in Table 4, where M refers to the number of machines. |