Decision Tree for Locally Private Estimation with Public Data
Authors: Yuheng Ma, Han Zhang, Yuchao Cai, Hanfang Yang
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on both synthetic and real-world data to demonstrate the superior performance of LPDT compared with other state-of-the-art LDP regression methods. |
| Researcher Affiliation | Academia | 1School of Statistics, Renmin University of China 2Faculty of Electrical Engineering, Mathematics and Computer Science, University of Twente 3Center for Applied Statistics, Renmin University of China |
| Pseudocode | Yes | Algorithm 1: Locally differentially private decision tree (LPDT) Input: Private data D = {(Xi, Yi)}n i=1, public data Dpub = {(Xpub i , Y pub i )}nq i=1 Parameters: Depth s, minimum leaf sample size nl. Curator create tree partition π following max-edge rule in Section 2.3 on public data Dpub. Data holders of D create privatized information (3) and (4) according to π. Curator aggregates the privatized information and compute f DP π by (5). Output: The LPDT estimator f DP π . |
| Open Source Code | Yes | The code of LPDT is available on Git Hub2. https://github.com/Karlmyh/LPDT |
| Open Datasets | Yes | ABA: The Abalone dataset originally comes from biological research [44] and now it is accessible on UCI Machine Learning Repository [22]. AIR: The Airfoil Self-Noise dataset on UCI Machine Learning Repository records the result of a series of aerodynamic and acoustic tests of airfoil blade sections conducted in an anechoic wind tunnel [12]. It comprises 1503 instances of 6 attributes including wind tunnel speeds and angles of attack. ... The dataset used in this study was obtained from the Differential Privacy Temporal Map Challenge (De ID2), which aims to develop algorithms that preserve data utility while guaranteeing individual privacy protection. |
| Dataset Splits | Yes | We employ 5-fold cross-validation for parameter selection, and techniques for tuning parameters under LDP are discussed in Section D.2. The evaluation metric is the mean squared error (MSE). We conduct experiments on 12 real datasets, each repeated 50 times with a ratio of 1:7:2 for public data, training data, and testing data in each trial. |
| Hardware Specification | Yes | All experiments are conducted on a machine with 72-core Intel Xeon 2.60GHz and 128GB of main memory. |
| Software Dependencies | No | The paper mentions 'Scikit-Learn' and 'Python' but does not specify version numbers for these or any other software components, which is required for reproducibility. |
| Experiment Setup | Yes | Experiment setup We choose the privacy budget ε [0.5, 8], covering commonly seen magnitudes of privacy budgets from low to high privacy regimes. We compare LPDT-M and LPDT-V with the following methods: (i) Private Histogram (PHIST) [9]. (ii) Adjusted Private Histogram (APHIST) [33]. (iii) Deconvolution Kernel (DECONV) [29]. Introduction to the methods and all implementation details are presented in Appendix D.1. We employ 5-fold cross-validation for parameter selection, and techniques for tuning parameters under LDP are discussed in Section D.2. The evaluation metric is the mean squared error (MSE). For LPDT-M and LPDT-V, we choose nl {2, 5, 10, 20, 40, 60, 80, 100, 120, 140, 160} and s {1, 2, 3, 4} in Section 4.2. For large data in Section 4.3, we let s {4, 6, 8, 10}. In addition, we add one more parameter adjusting the allocation of the privacy budget on the numerator and denominator. Specifically, let (3) and (4) be ... for ρ [0, 1]. In this case, the mechanisms are respectively ρε-LDP and (1 ρ)ε-LDP, which means the hybrid mechanism is still ε-LDP. We select ρ {0.3, 0.5, 0.7}. |