Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Differentially Private Identity and Equivalence Testing of Discrete Distributions
Authors: Maryam Aliakbarpour, Ilias Diakonikolas, Ronitt Rubinfeld
ICML 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform an experimental evaluation of our algorithms on synthetic data. Our experiments illustrate that our private testers achieve small type I and type II errors with sample size sublinear in the domain size of the underlying distributions. |
| Researcher Affiliation | Academia | 1CSAIL, MIT, Cambridge, MA 02139, USA 2Department of Computer Science, USC, Los Angeles, CA 90089, USA 3TAU, Tel Aviv-Yafo, Israel. |
| Pseudocode | Yes | Algorithm 1 Private Uniformity Testing via Unique Elements: Private-Unique-Elements-Uniformity; Algorithm 2 Private uniformity tester based on the number of collisions: Private-Collisions-Uniformity; Algorithm 3 Private Equivalence Tester: Private Equivalence-Test |
| Open Source Code | No | The paper does not provide any explicit statements or links regarding the availability of its source code. |
| Open Datasets | No | The paper uses "synthetic data" which is mathematically defined within the paper but does not refer to a publicly available dataset with a specific link, DOI, repository, or formal citation. |
| Dataset Splits | No | The paper does not specify training, validation, or test dataset splits. It describes running algorithms multiple times to estimate error probabilities on generated samples. |
| Hardware Specification | Yes | All experiments were performed on a computer with a 1.6 GHz Intel(R) Core(TM) i5-4200U CPU and 3 GB of RAM. |
| Software Dependencies | No | The paper mentions implementing algorithms but does not provide specific version numbers for any software dependencies (e.g., programming languages, libraries, or frameworks). |
| Experiment Setup | Yes | We run our two algorithms using samples from q+ and q- with the following parameters: n = 800, 000, = 0.3, r = 300, and = 0.2. We vary the sample size staring from 50, 000 and up to 3 x 10^6, increasing it by 50, 000 at each step, and repeat the algorithm for r = 200 times to estimate the maximum of type I and type II errors. |