Differentially Private n-gram Extraction

Authors: Kunho Kim, Sivakanth Gopi, Janardhan Kulkarni, Sergey Yekhanin

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we empirically evaluate the performance of our algorithms on two datasets: Reddit and MSNBC.
Researcher Affiliation Industry Kunho Kim Microsoft kuki@microsoft.com Sivakanth Gopi Microsoft Research sigopi@microsoft.com Janardhan Kulkarni Microsoft Research jakul@microsoft.com Sergey Yekhanin Microsoft Research yekhanin@microsoft.com
Pseudocode Yes In this section we describe our algorithm for DPNE. The pseudocode is presented in Algorithm 1.
Open Source Code Yes Code available at https://github.com/microsoft/differentially-private-ngram-extraction
Open Datasets Yes The Reddit data set is a natural language dataset used extensively in NLP applications, and is taken from Tensor Flow repository.5 The MSNBC dataset consists page visits of users who browsed msnbc.com on September 28, 1999, and is recorded at the level of URL and ordered by time.6
Dataset Splits No The paper uses Reddit and MSNBC datasets but does not specify how these datasets were split into training, validation, or test sets for their experiments. No explicit percentages, counts, or references to standard splits are provided for reproducibility.
Hardware Specification No The paper does not specify any hardware details such as GPU models, CPU types, or memory used for running the experiments.
Software Dependencies No The paper does not specify any software dependencies with their version numbers, such as programming languages, libraries, or frameworks used for implementation or experimentation.
Experiment Setup Yes Throughout this section we fix T = 9, ε = 4, δ = 10 7, 1 = = 9 = 0 = 300, η = 0.01 unless otherwise specified.