reproducibilityindex.ai

Differential Privacy in Scalable General Kernel Learning via $K$-means Nystr{\"o}m Random Features

Authors: Bonwoo Lee, Jeongyoun Ahn, Cheolwoo Park

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments to demonstrate the performance of our algorithms, comparing them with existing methods to highlight their superiority.
Researcher Affiliation	Academia	Bonwoo Lee Jeongyoun Ahn Cheolwoo Park KAIST Daejeon, 34141 South Korea
Pseudocode	Yes	Algorithm 1 DP K-means Nyström approximation.
Open Source Code	Yes	Also, the code attached in the supplementary is written by python, and does not use any non-open-sourced or non-free libraries.
Open Datasets	Yes	We assess the estimation error of DP KME algorithms using the adult dataset (see Appendix 6.1 for details) with a Gaussian kernel for a fixed number of landmark points m. Although Algorithm 3 uses the K-means-based Nyström method, we also explore a subsampling-based version (outlined in Algorithm 6 in Appendix 6.2) with a privacy budget that is 100 times larger, as it demonstrates meaningful results in low privacy regions. The real dataset, the adult dataset(Becker and Kohavi, 1996), can be downloaded from the UCI Machine Learning Repository.
Dataset Splits	No	1,000,000 training samples and 200,000 test samples were generated for each run to solve and evaluate the accuracy of the DP kernel ERM.
Hardware Specification	Yes	We use five Intel Xeon Gold 6248 processors for the computing resources in the simulation and one in the other experiments.
Software Dependencies	No	No specific versions for software dependencies like Python or PyTorch are provided in the paper.
Experiment Setup	Yes	The regularization parameter λ {10 5, 10 4, 10 3, 10 2.5, 10 2} is used in tuning. Figure 1a shows the highest accuracy achieved for each privacy parameter across these values of λ. The hyperparameters for Algorithm 1 are set as follows: the distribution Q is defined as a mixture of m truncated normal distributions, each supported on [0, 1]d and centered at {zi}i=1m, with standard deviation σi := maxj =i zj zi 2. Additionally, the thresholding parameter m0 is set to 0.01n .