Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Counting Distinct Elements Under Person-Level Differential Privacy
Authors: Thomas Steinke, Alexander Knop
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In addition to proving the above theoretical guarantees, we perform an experimental evaluation of our algorithm. |
| Researcher Affiliation | Industry | Alexander Knop Google EMAIL Thomas Steinke Google Deep Mind EMAIL |
| Pseudocode | Yes | Algorithm 1 Distinct Count Algorithm |
| Open Source Code | No | The paper does not provide any statement or link indicating that its source code is publicly available. |
| Open Datasets | Yes | We used four publicly available datasets to assess the accuracy of our algorithms compared to baselines. Two small datasets were used: Amazon Fashion 5-core [NLM19] (reviews of fashion products on Amazon) and Amazon Industrial and Scientific 5-core [NLM19] (reviews of industrial and scientific products on Amazon). Two large data sets were also used: Reddit [She20] (a data set of posts collected from r/Ask Reddit) and IMDb [N20; MDPHNP11] (a set of movie reviews scraped from IMDb). |
| Dataset Splits | No | The paper uses common datasets but does not provide specific details on how these datasets were split into training, validation, or testing sets for their experiments. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers used for implementation or experimentation. |
| Experiment Setup | Yes | Table 1: True and estimated (using DPDistinct Count with ε = 1, β = 0.05 and ℓmax = 100) counts per data set. |