Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Replicable Distribution Testing
Authors: Ilias Diakonikolas, Jingyi Gao, Daniel Kane, Sihan Liu, Christopher Ye
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | The abstract and introduction concisely demonstrate our contributions and scope. Guidelines: The answer NA means that the abstract and introduction do not include the claims made in the paper. The abstract and/or introduction should clearly state the claims made, including the contributions made in the paper and important assumptions and limitations. A No or NA answer to this question will not be perceived well by the reviewers. The claims made should match theoretical and experimental results, and reflect how much the results can be expected to generalize to other settings. It is fine to include aspirational goals as motivation as long as it is clear that these goals are not attained by the paper. 2. Limitations Question: Does the paper discuss the limitations of the work performed by the authors? Answer: [Yes] Justification: Our new lower bound framework for replicable distribution testing does not require additional assumptions on testers. Our replicable closeness tester with near-optimal sample complexity runs in linear time in sample size, yet our replicable independence tester runs in polynomial time in sample size, and we wonder whether one can obtain a more efficient linear time algorithm. We discuss this in our paper. Guidelines: The answer NA means that the paper has no limitation while the answer No means that the paper has limitations, but those are not discussed in the paper. The authors are encouraged to create a separate |
| Researcher Affiliation | Academia | Ilias Diakonikolas University of Wisconsin-Madison EMAIL Jingyi Gao University of Wisconsin-Madison EMAIL Daniel M. Kane University of California, San Diego EMAIL Sihan Liu University of California, San Diego La Jolla, CA EMAIL Christopher Ye University of California, San Diego La Jolla, CA EMAIL |
| Pseudocode | Yes | Algorithm 1 INDEPENDENCESTATS Input: a sample set Sp from the unknown distribution p over [n1] [n2], where n1 n2, and another sample set Sq from q, the product of marginals of p. Parameters: domain sizes n1 > n2 , tolerance ϵ (0, 1/4) , replicability ρ (0, 1/4). Output: A test statistic related to whether these samples came from an independent distribution. 1: Set m = Θ n2/3 1 n1/3 2 ρ 2/3ε 4/3 + n1n2ρ 1ε 2 + ρ 2ε 2 . 2: Set α = min(n1/(100m), 1/100), β = n2/(100m). 3: Compute the flattened samples Sf p Sf q Flatten (Sp Sq; α, β). 4: Abort and return 0 if |Sp| |Sf p | > 10n1 or |Sq| |Sf q | > 10n2. 5: Sample ℓ, ℓ Poi(m). Abort and return 0 if ℓ> |Sf p | or ℓ > |Sf q |. 6: Keep only the first ℓsamples of Sf p and only the first ℓ samples of Sf q . 7: Compute and return the closeness test statistic ZC(Sf p , Sf q ). |
| Open Source Code | No | Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [NA] Justification: The paper does not include experiments requiring code. |
| Open Datasets | No | Question: Does the paper provide CONCRETE ACCESS INFORMATION (specific link, DOI, repository name, formal citation with authors/year, or reference to established benchmark datasets) for a publicly available or open dataset? Answer: [NA] Justification: The paper does not include experiments. |
| Dataset Splits | No | Question: Does the paper provide SPECIFIC DATASET SPLIT INFORMATION (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning? Answer: [NA] Justification: The paper does not include experiments. |
| Hardware Specification | No | Question: For each experiment, does the paper provide sufficient information on the computer resources (type of compute workers, memory, time of execution) needed to reproduce the experiments? Answer: [NA] Justification: The paper does not include experiments. |
| Software Dependencies | No | Question: Does the paper provide SPECIFIC ANCILLARY SOFTWARE DETAILS (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment? Answer: [NA] Justification: The paper does not include experiments. |
| Experiment Setup | No | Question: Does the paper contain SPECIFIC EXPERIMENTAL SETUP DETAILS (concrete hyperparameter values, training configurations, or system-level settings) in the main text? Answer: [NA] Justification: The paper does not include experiments. |