Collecting Telemetry Data Privately

Authors: Bolin Ding, Janardhan Kulkarni, Sergey Yekhanin

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct empirical evaluation on real-world counter datasets to verify our theoretical results.
Researcher Affiliation Industry Bolin Ding, Janardhan Kulkarni, Sergey Yekhanin Microsoft Research {bolind, jakul, yekhanin}@microsoft.com
Pseudocode No The paper describes collection mechanisms with mathematical formulas and textual descriptions, such as "Collection mechanism 1Bit Mean: When the collection of counter xi(t) at time t is requested by the data collector, each user i sends one bit bi(t), which is independently drawn from the distribution:", but does not provide a clearly labeled pseudocode block or algorithm steps formatted like code.
Open Source Code No The paper mentions that their mechanisms "have been deployed by Microsoft" but does not provide any information or links regarding the public availability of the source code for their methodology.
Open Datasets No The paper uses a "real-world dataset of 3 million users with their daily usage of an app (App A) collected (in seconds) over a continuous period of 31 days" which is an internal dataset, but it does not provide concrete access information (e.g., link, DOI, or citation to a public repository) for this dataset.
Dataset Splits No The paper does not provide specific details about train/validation/test dataset splits. It mentions running mechanisms on "31-day usage data with three counters" and varying the number of users by "randomly picking subsets of all users", which is not a standard data split methodology.
Hardware Specification No The paper does not explicitly describe the hardware used to run its experiments. There is no mention of specific CPU, GPU models, or other hardware specifications.
Software Dependencies No The paper does not provide a reproducible description of ancillary software with specific version numbers. It does not mention any software libraries, frameworks, or compilers used for implementation or experimentation with version details.
Experiment Setup Yes We vary the value of ε (ε = 0.1-10) and the number of users (n = 0.3, 3 × 10^6 by randomly picking subsets of all users), and run all the mechanisms 3000 times on 31-day usage data with three counters. The domain size is m = 24 hours. We create k = 32 buckets on [0, 24 hours] with even widths to evaluate mechanisms for histogram estimation.