reproducibilityindex.ai

Effective and Efficient Federated Tree Learning on Hybrid Data

Authors: Qinbin Li, Chulin Xie, Xiaojun Xu, Xiaoyuan Liu, Ce Zhang, Bo Li, Bingsheng He, Dawn Song

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments demonstrate that Hybrid Tree can achieve comparable accuracy to the centralized setting with low computational and communication overhead. Hybrid Tree can achieve up to 8 times speedup compared with the other baselines. We conduct extensive experiments on simulated and natural hybrid federated datasets. Our experiments show that Hybrid Tree is much more efficient than the other baselines with a close accuracy to centralized training.
Researcher Affiliation	Collaboration	Qinbin Li UC Berkeley qinbin@berkeley.edu; Ce Zhang Together AI, University of Chicago cez@uchicago.edu
Pseudocode	Yes	Algorithm 1: The Hybrid Tree training algorithm; Algorithm 2: Train a single tree in GBDT.
Open Source Code	No	The paper does not provide any explicit statement about making the source code for their proposed methodology publicly available, nor does it include a link to a code repository.
Open Datasets	Yes	We use four datasets in our experiments: 1) Two versions of hybrid FL datasets provided by PETs Prize Challenge for anomalous transaction detection... Driven Data. U.s. pets prize challenge: Phase 1. URL https://www.drivendata.org/ competitions/98/nist-federated-learning-1/page/524/. ... Adult and Cod-rna1 ... 1https://www.csie.ntu.edu.cn/~cjlin/libsvmtools/datasets/
Dataset Splits	No	The paper provides counts for training and test instances for each dataset (Table 5) but does not explicitly mention or detail a separate validation dataset split or its proportion.
Hardware Specification	Yes	We run experiments on a machine with four Intel Xeon Gold 6226R 16-Core CPUs. We fix the number of threads to 10 for each experiment.
Software Dependencies	No	The paper does not provide specific software dependency details, such as programming language versions or library versions, needed to replicate the experiment.
Experiment Setup	Yes	We train a GBDT model with 50 trees. The learning rate is set to 0.1. The maximum depth is set to 7 for the baselines. The maximum depth for the host is set to 5 and the maximum depth for guests is set to 2 for Hybrid Tree so that the total depth of the tree is 7 to ensure a fair comparison. The regularization term λ is set to 1.