Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Competitive Distribution Estimation: Why is Good-Turing Good
Authors: Alon Orlitsky, Ananda Theertha Suresh
NeurIPS 2015 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Figure 2: Simulation results for support 10000, number of samples ranging from 1000 to 50000, averaged over 200 trials. and We compare the performance of this estimator to four estimators |
| Researcher Affiliation | Academia | Alon Orlitsky UC San Diego EMAIL Ananda Theertha Suresh UC San Diego EMAIL |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. The methods are described mathematically and textually. |
| Open Source Code | No | No concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described in this paper was found. |
| Open Datasets | No | The paper describes generating data from various distributions (e.g., 'Uniform', 'Zipf', 'Dirichlet prior') for simulations, but does not refer to or provide access information for a publicly available or open dataset. |
| Dataset Splits | No | The paper describes simulation parameters like 'number of samples ranging from 1000 to 50000, averaged over 200 trials', but does not specify dataset splits (training, validation, test) or cross-validation setup for reproducibility. |
| Hardware Specification | No | No specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running experiments were mentioned in the paper. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | For symbols appearing t times, if ϕt+1 Ω(t), then the Good-Turing estimate is close to the underlying total probability mass, otherwise the empirical estimate is closer. Hence, for a symbol appearing t times, if ϕt = t, we use the Good-Turing estimator, otherwise we use the empirical estimator. If nx = t, qx(xn) = ( t / N if t > ϕt+1, ϕt+1+1 / ϕt+1 * nx / N else) ... All distributions have support size k = 10000. n ranges from 1000 to 50000 and the results are averaged over 200 trials. |