Collecting Telemetry Data Privately
Authors: Bolin Ding, Janardhan Kulkarni, Sergey Yekhanin
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct empirical evaluation on real-world counter datasets to verify our theoretical results. |
| Researcher Affiliation | Industry | Bolin Ding, Janardhan Kulkarni, Sergey Yekhanin Microsoft Research {bolind, jakul, yekhanin}@microsoft.com |
| Pseudocode | No | The paper describes collection mechanisms with mathematical formulas and textual descriptions, such as "Collection mechanism 1Bit Mean: When the collection of counter xi(t) at time t is requested by the data collector, each user i sends one bit bi(t), which is independently drawn from the distribution:", but does not provide a clearly labeled pseudocode block or algorithm steps formatted like code. |
| Open Source Code | No | The paper mentions that their mechanisms "have been deployed by Microsoft" but does not provide any information or links regarding the public availability of the source code for their methodology. |
| Open Datasets | No | The paper uses a "real-world dataset of 3 million users with their daily usage of an app (App A) collected (in seconds) over a continuous period of 31 days" which is an internal dataset, but it does not provide concrete access information (e.g., link, DOI, or citation to a public repository) for this dataset. |
| Dataset Splits | No | The paper does not provide specific details about train/validation/test dataset splits. It mentions running mechanisms on "31-day usage data with three counters" and varying the number of users by "randomly picking subsets of all users", which is not a standard data split methodology. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used to run its experiments. There is no mention of specific CPU, GPU models, or other hardware specifications. |
| Software Dependencies | No | The paper does not provide a reproducible description of ancillary software with specific version numbers. It does not mention any software libraries, frameworks, or compilers used for implementation or experimentation with version details. |
| Experiment Setup | Yes | We vary the value of ε (ε = 0.1-10) and the number of users (n = 0.3, 3 × 10^6 by randomly picking subsets of all users), and run all the mechanisms 3000 times on 31-day usage data with three counters. The domain size is m = 24 hours. We create k = 32 buckets on [0, 24 hours] with even widths to evaluate mechanisms for histogram estimation. |