reproducibilityindex.ai

Instance-Specific Asymmetric Sensitivity in Differential Privacy

Authors: David Durfee

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide theoretical guarantees and empirical validation that our technique will be particularly effective when the distances to the underlying dataset are asymmetric. We efficiently instantiate our method in O(n) time for these problems and empirically show that our techniques will give substantially improved differentially private estimations. We further invoke our method upon model evaluation for both classification and regression tasks with corresponding efficient implementations and empirical studies showing improved estimations (see Section 6).
Researcher Affiliation	Industry	David Durfee Mozilla Anonym ddurfee@mozilla.com
Pseudocode	Yes	Algorithm 1 Above Threshold Require: Input dataset 𝒙, a stream of queries {𝑓𝑖 𝑛 ℝ} with sensitivity Δ, and a threshold 𝑇. Algorithm 2 Variance instantiation of asymmetric sensitivity mechanism Require: Input dataset 𝒙, and parameter 𝛽> 1
Open Source Code	No	The paper does not provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material. Justification: The data is open source and the code is straightforward to reproduce as all algorithms are simple, but we have not open sourced the code. We'd be happy to provide all code used upon request.
Open Datasets	Yes	For our instantiations of machine learning model evaluation we will be using the following datasets for regression tasks: Diamonds dataset containing diamond prices and related features Wickham (2016); Abalone dataset containing age of abalone and related features Nash et al. (1995); and Bike dataset containing number of bike rentals and related features Fanaee-T (2013). We will also use the labels from these datasets to test our variance invocation. We also use the Adult dataset, Becker & Kohavi (1996), for model evaluation of classification tasks so we will borrow two of the features, age and hours worked per week, to test our variance invocation. For the tabular data, we partition into train and test with an 80/20 split and train with an xgboost classifier with the default parameters. For the mnist data, which is already partitioned, we use a simple MLP with one inner dense layer of 128 neurons and relu activation, and the final layer of 10 neurons has a softmax activation. We train this model for 5 epochs. For the cifar10 data, which is already partitioned, we use a relatively small CNN with several pooling and convolutional layers,
Dataset Splits	No	For the tabular data, we partition into train and test with an 80/20 split and train with an xgboost classifier with the default parameters.
Hardware Specification	No	Our algorithms are lightweight so we just used basic colab notebooks to run the different empirical studies, but this was not specified in the paper.
Software Dependencies	No	The paper mentions software like 'xgboost classifier' and 'MLP' but does not provide specific version numbers for these software components or other key dependencies.
Experiment Setup	Yes	For the tabular data, we partition into train and test with an 80/20 split and train with an xgboost classifier with the default parameters. For the mnist data, which is already partitioned, we use a simple MLP with one inner dense layer of 128 neurons and relu activation, and the final layer of 10 neurons has a softmax activation. We train this model for 5 epochs. Our algorithm (Algorithm 2 in Appendix D) will have a parameter 𝛽which we fix 𝛽= 1.005 and will maintain this consistency across all experiments.