On the Gini-impurity Preservation For Privacy Random Forests

Authors: XinRan Xie, Man-Jie Yuan, Xuetong Bai, Wei Gao, Zhi-Hua Zhou

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We finally present extensive empirical studies to validate the effectiveness, efficiency and security of our proposed method. [...] Extensive experiments show that our encrypted random forests take significantly better performance than prior privacy random forests via encryption, anonymization and differential privacy, and are comparable to original (plaintexts) random forests without encryption.
Researcher Affiliation Academia Xin-Ran Xie , Man-Jie Yuan , Xue-Tong Bai, Wei Gao, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology, Nanjing University, China School of Artificial Intelligence, Nanjing University, China {xiexr,yuanmj,baixt,gaow,zhouzh}@lamda.nju.edu.cn
Pseudocode Yes Algorithm 1 The Gini-impurity-preserving encryption [...] Algorithm 2 Splitting a node for encryption [...] Algorithm 3 Finding the best splitting feature and position
Open Source Code No The paper does not provide an explicit statement or link to its own source code. It only mentions where the code for comparison methods was downloaded from.
Open Datasets Yes We conduct experiments on 20 datasets2 as summarized in Table 2. Most datasets have been wellstudied in previous random forests. 2Downloaded from www.openml.org
Dataset Splits Yes The performance is evaluated by five trials of 5-fold cross validation, and final prediction accuracies are obtained by averaging over these 25 runs, as summarized in Table 3.
Hardware Specification Yes All experiments are performed by c++ on the Ubuntu with 256GB main memory (AMD Ryzen Threadripper 3970X).
Software Dependencies No The paper mentions 'c++ on the Ubuntu' but does not provide specific version numbers for compilers, libraries, or other software dependencies beyond the operating system.
Experiment Setup Yes For all random forests, we train 100 individual decision trees, and randomly select d candidate features during node splitting. We set α = 10 for datasets of size smaller than 20,000 for our encrypted random forests; otherwise, set α = 100, following [95]. For multi-class datasets, we take the one-vs-all method for Mul PRFs, since it is limited to binary classification. Other parameters are set according to their respective references, and more details can be found in Appendix D. Tables 4 and 5 summarizes some hyperparameters settings in our experiments. Except for parameters n_estimators and α in leaf splitting, other parameters are set according to their respective references. We set security parameter λ > 6.4 according to privacy-preserving requisites as in [89].