On the Gini-impurity Preservation For Privacy Random Forests
Authors: XinRan Xie, Man-Jie Yuan, Xuetong Bai, Wei Gao, Zhi-Hua Zhou
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We finally present extensive empirical studies to validate the effectiveness, efficiency and security of our proposed method. [...] Extensive experiments show that our encrypted random forests take significantly better performance than prior privacy random forests via encryption, anonymization and differential privacy, and are comparable to original (plaintexts) random forests without encryption. |
| Researcher Affiliation | Academia | Xin-Ran Xie , Man-Jie Yuan , Xue-Tong Bai, Wei Gao, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology, Nanjing University, China School of Artificial Intelligence, Nanjing University, China {xiexr,yuanmj,baixt,gaow,zhouzh}@lamda.nju.edu.cn |
| Pseudocode | Yes | Algorithm 1 The Gini-impurity-preserving encryption [...] Algorithm 2 Splitting a node for encryption [...] Algorithm 3 Finding the best splitting feature and position |
| Open Source Code | No | The paper does not provide an explicit statement or link to its own source code. It only mentions where the code for comparison methods was downloaded from. |
| Open Datasets | Yes | We conduct experiments on 20 datasets2 as summarized in Table 2. Most datasets have been wellstudied in previous random forests. 2Downloaded from www.openml.org |
| Dataset Splits | Yes | The performance is evaluated by five trials of 5-fold cross validation, and final prediction accuracies are obtained by averaging over these 25 runs, as summarized in Table 3. |
| Hardware Specification | Yes | All experiments are performed by c++ on the Ubuntu with 256GB main memory (AMD Ryzen Threadripper 3970X). |
| Software Dependencies | No | The paper mentions 'c++ on the Ubuntu' but does not provide specific version numbers for compilers, libraries, or other software dependencies beyond the operating system. |
| Experiment Setup | Yes | For all random forests, we train 100 individual decision trees, and randomly select d candidate features during node splitting. We set α = 10 for datasets of size smaller than 20,000 for our encrypted random forests; otherwise, set α = 100, following [95]. For multi-class datasets, we take the one-vs-all method for Mul PRFs, since it is limited to binary classification. Other parameters are set according to their respective references, and more details can be found in Appendix D. Tables 4 and 5 summarizes some hyperparameters settings in our experiments. Except for parameters n_estimators and α in leaf splitting, other parameters are set according to their respective references. We set security parameter λ > 6.4 according to privacy-preserving requisites as in [89]. |