Data Amplification: A Unified and Competitive Approach to Property Estimation

Authors: Yi Hao, Alon Orlitsky, Ananda Theertha Suresh, Yihong Wu

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We illustrate the estimator s practical advantages by comparing it to existing estimators for a wide variety of properties and distributions. We evaluated the new estimator f by comparing its performance to several recent estimators [13 15, 22, 27]. To ensure robustness of the results, we performed the comparisons for all the symmetric properties described in the introduction... The results for the first three properties are shown in Figures 1 3...
Researcher Affiliation Collaboration Yi HAO Dept. of Electrical and Computer Engineering University of California, San Diego La Jolla, CA 92093 yih179@eng.ucsd.edu Alon Orlitsky Dept. of Electrical and Computer Engineering University of California, San Diego La Jolla, CA 92093 alon@eng.ucsd.edu Ananda T. Suresh Google Research, New York New York, NY 10011 theertha@google.com Yihong Wu Dept. of Statistics and Data Science Yale University New Haven, CT 06511 yihong.wu@yale.edu
Pseudocode No The paper describes the estimator's construction mathematically but does not include a dedicated pseudocode or algorithm block.
Open Source Code No The paper does not provide any explicit statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets Yes We tested the five properties on the following distributions: uniform distribution; a distribution randomly generated from Dirichlet prior with parameter 2; Zipf distribution with power 1.5; Binomial distribution with success probability 0.3; Poisson distribution with mean 3,000; geometric distribution with success probability 0.99.
Dataset Splits No The paper does not explicitly provide details about train/validation/test dataset splits or cross-validation methodology.
Hardware Specification No The paper does not provide any specific hardware details (like GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper does not provide specific ancillary software details, such as library or solver names with version numbers.
Experiment Setup Yes We tested the five properties on the following distributions... The number of samples, n, ranged from 1,000 to 100,000... Each experiment was repeated 100 times... We chose the amplification parameter t as log1 α n + 1, where α {0.0, 0.1, 0.2, ..., 0.6} was selected based on independent data, and similarly for s0.