Effective and Efficient Federated Tree Learning on Hybrid Data

Authors: Qinbin Li, Chulin Xie, Xiaojun Xu, Xiaoyuan Liu, Ce Zhang, Bo Li, Bingsheng He, Dawn Song

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments demonstrate that Hybrid Tree can achieve comparable accuracy to the centralized setting with low computational and communication overhead. Hybrid Tree can achieve up to 8 times speedup compared with the other baselines. We conduct extensive experiments on simulated and natural hybrid federated datasets. Our experiments show that Hybrid Tree is much more efficient than the other baselines with a close accuracy to centralized training.
Researcher Affiliation Collaboration Qinbin Li UC Berkeley qinbin@berkeley.edu; Ce Zhang Together AI, University of Chicago cez@uchicago.edu
Pseudocode Yes Algorithm 1: The Hybrid Tree training algorithm; Algorithm 2: Train a single tree in GBDT.
Open Source Code No The paper does not provide any explicit statement about making the source code for their proposed methodology publicly available, nor does it include a link to a code repository.
Open Datasets Yes We use four datasets in our experiments: 1) Two versions of hybrid FL datasets provided by PETs Prize Challenge for anomalous transaction detection... Driven Data. U.s. pets prize challenge: Phase 1. URL https://www.drivendata.org/ competitions/98/nist-federated-learning-1/page/524/. ... Adult and Cod-rna1 ... 1https://www.csie.ntu.edu.cn/~cjlin/libsvmtools/datasets/
Dataset Splits No The paper provides counts for training and test instances for each dataset (Table 5) but does not explicitly mention or detail a separate validation dataset split or its proportion.
Hardware Specification Yes We run experiments on a machine with four Intel Xeon Gold 6226R 16-Core CPUs. We fix the number of threads to 10 for each experiment.
Software Dependencies No The paper does not provide specific software dependency details, such as programming language versions or library versions, needed to replicate the experiment.
Experiment Setup Yes We train a GBDT model with 50 trees. The learning rate is set to 0.1. The maximum depth is set to 7 for the baselines. The maximum depth for the host is set to 5 and the maximum depth for guests is set to 2 for Hybrid Tree so that the total depth of the tree is 7 to ensure a fair comparison. The regularization term λ is set to 1.