Competitive Distribution Estimation: Why is Good-Turing Good
Authors: Alon Orlitsky, Ananda Theertha Suresh
NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Figure 2: Simulation results for support 10000, number of samples ranging from 1000 to 50000, averaged over 200 trials. and We compare the performance of this estimator to four estimators |
| Researcher Affiliation | Academia | Alon Orlitsky UC San Diego alon@ucsd.edu Ananda Theertha Suresh UC San Diego asuresh@ucsd.edu |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. The methods are described mathematically and textually. |
| Open Source Code | No | No concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described in this paper was found. |
| Open Datasets | No | The paper describes generating data from various distributions (e.g., 'Uniform', 'Zipf', 'Dirichlet prior') for simulations, but does not refer to or provide access information for a publicly available or open dataset. |
| Dataset Splits | No | The paper describes simulation parameters like 'number of samples ranging from 1000 to 50000, averaged over 200 trials', but does not specify dataset splits (training, validation, test) or cross-validation setup for reproducibility. |
| Hardware Specification | No | No specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running experiments were mentioned in the paper. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | For symbols appearing t times, if ϕt+1 Ω(t), then the Good-Turing estimate is close to the underlying total probability mass, otherwise the empirical estimate is closer. Hence, for a symbol appearing t times, if ϕt = t, we use the Good-Turing estimator, otherwise we use the empirical estimator. If nx = t, qx(xn) = ( t / N if t > ϕt+1, ϕt+1+1 / ϕt+1 * nx / N else) ... All distributions have support size k = 10000. n ranges from 1000 to 50000 and the results are averaged over 200 trials. |