Competitive Distribution Estimation: Why is Good-Turing Good

Authors: Alon Orlitsky, Ananda Theertha Suresh

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Figure 2: Simulation results for support 10000, number of samples ranging from 1000 to 50000, averaged over 200 trials. and We compare the performance of this estimator to four estimators
Researcher Affiliation Academia Alon Orlitsky UC San Diego alon@ucsd.edu Ananda Theertha Suresh UC San Diego asuresh@ucsd.edu
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks. The methods are described mathematically and textually.
Open Source Code No No concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described in this paper was found.
Open Datasets No The paper describes generating data from various distributions (e.g., 'Uniform', 'Zipf', 'Dirichlet prior') for simulations, but does not refer to or provide access information for a publicly available or open dataset.
Dataset Splits No The paper describes simulation parameters like 'number of samples ranging from 1000 to 50000, averaged over 200 trials', but does not specify dataset splits (training, validation, test) or cross-validation setup for reproducibility.
Hardware Specification No No specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running experiments were mentioned in the paper.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes For symbols appearing t times, if ϕt+1 Ω(t), then the Good-Turing estimate is close to the underlying total probability mass, otherwise the empirical estimate is closer. Hence, for a symbol appearing t times, if ϕt = t, we use the Good-Turing estimator, otherwise we use the empirical estimator. If nx = t, qx(xn) = ( t / N if t > ϕt+1, ϕt+1+1 / ϕt+1 * nx / N else) ... All distributions have support size k = 10000. n ranges from 1000 to 50000 and the results are averaged over 200 trials.