Instance-Specific Asymmetric Sensitivity in Differential Privacy
Authors: David Durfee
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide theoretical guarantees and empirical validation that our technique will be particularly effective when the distances to the underlying dataset are asymmetric. We efficiently instantiate our method in O(n) time for these problems and empirically show that our techniques will give substantially improved differentially private estimations. We further invoke our method upon model evaluation for both classification and regression tasks with corresponding efficient implementations and empirical studies showing improved estimations (see Section 6). |
| Researcher Affiliation | Industry | David Durfee Mozilla Anonym ddurfee@mozilla.com |
| Pseudocode | Yes | Algorithm 1 Above Threshold Require: Input dataset π, a stream of queries {ππ π β} with sensitivity Ξ, and a threshold π. Algorithm 2 Variance instantiation of asymmetric sensitivity mechanism Require: Input dataset π, and parameter π½> 1 |
| Open Source Code | No | The paper does not provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material. Justification: The data is open source and the code is straightforward to reproduce as all algorithms are simple, but we have not open sourced the code. We'd be happy to provide all code used upon request. |
| Open Datasets | Yes | For our instantiations of machine learning model evaluation we will be using the following datasets for regression tasks: Diamonds dataset containing diamond prices and related features Wickham (2016); Abalone dataset containing age of abalone and related features Nash et al. (1995); and Bike dataset containing number of bike rentals and related features Fanaee-T (2013). We will also use the labels from these datasets to test our variance invocation. We also use the Adult dataset, Becker & Kohavi (1996), for model evaluation of classification tasks so we will borrow two of the features, age and hours worked per week, to test our variance invocation. For the tabular data, we partition into train and test with an 80/20 split and train with an xgboost classifier with the default parameters. For the mnist data, which is already partitioned, we use a simple MLP with one inner dense layer of 128 neurons and relu activation, and the final layer of 10 neurons has a softmax activation. We train this model for 5 epochs. For the cifar10 data, which is already partitioned, we use a relatively small CNN with several pooling and convolutional layers, |
| Dataset Splits | No | For the tabular data, we partition into train and test with an 80/20 split and train with an xgboost classifier with the default parameters. |
| Hardware Specification | No | Our algorithms are lightweight so we just used basic colab notebooks to run the different empirical studies, but this was not specified in the paper. |
| Software Dependencies | No | The paper mentions software like 'xgboost classifier' and 'MLP' but does not provide specific version numbers for these software components or other key dependencies. |
| Experiment Setup | Yes | For the tabular data, we partition into train and test with an 80/20 split and train with an xgboost classifier with the default parameters. For the mnist data, which is already partitioned, we use a simple MLP with one inner dense layer of 128 neurons and relu activation, and the final layer of 10 neurons has a softmax activation. We train this model for 5 epochs. Our algorithm (Algorithm 2 in Appendix D) will have a parameter π½which we fix π½= 1.005 and will maintain this consistency across all experiments. |