Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Counting Distinct Elements Under Person-Level Differential Privacy

Authors: Thomas Steinke, Alexander Knop

NeurIPS 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In addition to proving the above theoretical guarantees, we perform an experimental evaluation of our algorithm.
Researcher Affiliation	Industry	Alexander Knop Google EMAIL Thomas Steinke Google Deep Mind EMAIL
Pseudocode	Yes	Algorithm 1 Distinct Count Algorithm
Open Source Code	No	The paper does not provide any statement or link indicating that its source code is publicly available.
Open Datasets	Yes	We used four publicly available datasets to assess the accuracy of our algorithms compared to baselines. Two small datasets were used: Amazon Fashion 5-core [NLM19] (reviews of fashion products on Amazon) and Amazon Industrial and Scientiﬁc 5-core [NLM19] (reviews of industrial and scientiﬁc products on Amazon). Two large data sets were also used: Reddit [She20] (a data set of posts collected from r/Ask Reddit) and IMDb [N20; MDPHNP11] (a set of movie reviews scraped from IMDb).
Dataset Splits	No	The paper uses common datasets but does not provide specific details on how these datasets were split into training, validation, or testing sets for their experiments.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers used for implementation or experimentation.
Experiment Setup	Yes	Table 1: True and estimated (using DPDistinct Count with ε = 1, β = 0.05 and ℓmax = 100) counts per data set.