Effective and Efficient Federated Tree Learning on Hybrid Data
Authors: Qinbin Li, Chulin Xie, Xiaojun Xu, Xiaoyuan Liu, Ce Zhang, Bo Li, Bingsheng He, Dawn Song
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate that Hybrid Tree can achieve comparable accuracy to the centralized setting with low computational and communication overhead. Hybrid Tree can achieve up to 8 times speedup compared with the other baselines. We conduct extensive experiments on simulated and natural hybrid federated datasets. Our experiments show that Hybrid Tree is much more efficient than the other baselines with a close accuracy to centralized training. |
| Researcher Affiliation | Collaboration | Qinbin Li UC Berkeley qinbin@berkeley.edu; Ce Zhang Together AI, University of Chicago cez@uchicago.edu |
| Pseudocode | Yes | Algorithm 1: The Hybrid Tree training algorithm; Algorithm 2: Train a single tree in GBDT. |
| Open Source Code | No | The paper does not provide any explicit statement about making the source code for their proposed methodology publicly available, nor does it include a link to a code repository. |
| Open Datasets | Yes | We use four datasets in our experiments: 1) Two versions of hybrid FL datasets provided by PETs Prize Challenge for anomalous transaction detection... Driven Data. U.s. pets prize challenge: Phase 1. URL https://www.drivendata.org/ competitions/98/nist-federated-learning-1/page/524/. ... Adult and Cod-rna1 ... 1https://www.csie.ntu.edu.cn/~cjlin/libsvmtools/datasets/ |
| Dataset Splits | No | The paper provides counts for training and test instances for each dataset (Table 5) but does not explicitly mention or detail a separate validation dataset split or its proportion. |
| Hardware Specification | Yes | We run experiments on a machine with four Intel Xeon Gold 6226R 16-Core CPUs. We fix the number of threads to 10 for each experiment. |
| Software Dependencies | No | The paper does not provide specific software dependency details, such as programming language versions or library versions, needed to replicate the experiment. |
| Experiment Setup | Yes | We train a GBDT model with 50 trees. The learning rate is set to 0.1. The maximum depth is set to 7 for the baselines. The maximum depth for the host is set to 5 and the maximum depth for guests is set to 2 for Hybrid Tree so that the total depth of the tree is 7 to ensure a fair comparison. The regularization term λ is set to 1. |