reproducibilityindex.ai

A Communication-Efficient Parallel Algorithm for Decision Tree

Authors: Qi Meng, Guolin Ke, Taifeng Wang, Wei Chen, Qiwei Ye, Zhi-Ming Ma, Tie-Yan Liu

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments on real-world datasets show that PV-Tree significantly outperforms the existing parallel decision tree algorithms in the trade-off between accuracy and efficiency.
Researcher Affiliation	Collaboration	1Peking University 2Microsoft Research 3Chinese Academy of Mathematics and Systems Science
Pseudocode	Yes	Algorithm 1 Bulid Tree, Algorithm 2 Find Best Split, Algorithm 3 PV-Tree_Find Best Split
Open Source Code	Yes	Furthermore, we will open-source PV-Tree algorithm to benefit more researchers and practitioners.
Open Datasets	Yes	We used two data sets, one for learning to rank (LTR) and the other for ad click prediction (CTR)8 (see Table 1 for details). ... 8We use private data in LTR experiments and data of KDD Cup 2012 track 2 in CTR experiments.
Dataset Splits	No	The paper mentions training and testing data sizes in Table 1, but does not provide details on validation splits or methodologies (e.g., k-fold cross-validation, specific percentages for validation sets).
Hardware Specification	Yes	Our experimental environment is a cluster of servers (each with 12 CPU cores and 32 GB RAM) inter-connected with 1 Gbps Ethernet.
Software Dependencies	No	The paper describes the algorithms implemented for comparison (e.g., attribute-parallel, data-parallel) and mentions general machine learning frameworks like GBDT, but it does not specify any particular software libraries or their version numbers used in the implementation or for experiments.
Experiment Setup	Yes	For the experiments on LTR, we used 8 machines for parallel training; and for the experiments on CTR, we used 32 machines since the dataset is much larger. ... In PV-Tree, we have a parameter k, which controls the number of top attributes selected during local and global voting... To gain more insights on this, we conducted some experiments, whose results are shown in Table 4, where M refers to the number of machines.