Practical Federated Gradient Boosting Decision Trees
Authors: Qinbin Li, Zeyi Wen, Bingsheng He4642-4649
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present the effectiveness and efficiency of Sim FL. To understand the model accuracy of Sim FL, we compare Sim FL with two approaches: 1) SOLO: Each party only trains vanilla GBDTs with its local data. This comparison shows the incentives of using Sim FL. 2) ALL-IN: A party trains vanilla GBDTs with the joint data from all parties without the concern of privacy. This comparison demonstrates the potential accuracy loss of achieving the privacy model. We also compare Sim FL with the distributed boosting framework proposed by Zhao et al. (2018) (referred as TFL (Treebased Federated Learning)). |
| Researcher Affiliation | Academia | Qinbin Li,1 Zeyi Wen,2 Bingsheng He1 1National University of Singapore 2The University of Western Australia {qinbin, hebs}@comp.nus.edu.sg, zeyi.wen@uwa.edu.au |
| Pseudocode | Yes | Algorithm 1: The preprocessing stage |
| Open Source Code | No | The paper mentions using Thunder GBM and provides a link to its GitHub repository, but it does not provide concrete access to the source code for the Sim FL methodology described in this paper. |
| Open Datasets | Yes | We use six public datasets from the LIBSVM website1, as listed in Table 1. |
| Dataset Splits | No | The paper states: 'We use 75% of the datasets for training and the remainder for testing.' This describes a train/test split but does not specify a separate validation split or explicit validation methodology. |
| Hardware Specification | Yes | We conducted the experiments on a machine running Linux with two Xeon E5-2640v4 10 core CPUs, 256GB main memory and a Tesla P100 GPU of 12GB memory. |
| Software Dependencies | No | The paper mentions using 'Thunder GBM' but does not provide specific version numbers for this or any other software dependencies. |
| Experiment Setup | Yes | The maximum depth of the trees is set to 8. For the LSH functions, we choose r = 4.0 and L = min{40, d 1}, where d is the dimension of the dataset. The total number of trees is set to 500 in all approaches. |