Differentially Private n-gram Extraction
Authors: Kunho Kim, Sivakanth Gopi, Janardhan Kulkarni, Sergey Yekhanin
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we empirically evaluate the performance of our algorithms on two datasets: Reddit and MSNBC. |
| Researcher Affiliation | Industry | Kunho Kim Microsoft kuki@microsoft.com Sivakanth Gopi Microsoft Research sigopi@microsoft.com Janardhan Kulkarni Microsoft Research jakul@microsoft.com Sergey Yekhanin Microsoft Research yekhanin@microsoft.com |
| Pseudocode | Yes | In this section we describe our algorithm for DPNE. The pseudocode is presented in Algorithm 1. |
| Open Source Code | Yes | Code available at https://github.com/microsoft/differentially-private-ngram-extraction |
| Open Datasets | Yes | The Reddit data set is a natural language dataset used extensively in NLP applications, and is taken from Tensor Flow repository.5 The MSNBC dataset consists page visits of users who browsed msnbc.com on September 28, 1999, and is recorded at the level of URL and ordered by time.6 |
| Dataset Splits | No | The paper uses Reddit and MSNBC datasets but does not specify how these datasets were split into training, validation, or test sets for their experiments. No explicit percentages, counts, or references to standard splits are provided for reproducibility. |
| Hardware Specification | No | The paper does not specify any hardware details such as GPU models, CPU types, or memory used for running the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with their version numbers, such as programming languages, libraries, or frameworks used for implementation or experimentation. |
| Experiment Setup | Yes | Throughout this section we fix T = 9, ε = 4, δ = 10 7, 1 = = 9 = 0 = 300, η = 0.01 unless otherwise specified. |