Worst-Case Analysis for Randomly Collected Data

Authors: Justin Chen, Gregory Valiant, Paul Valiant

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimentally demonstrate the benefit of this framework and our algorithm in comparison to standard estimators, for several such settings.
Researcher Affiliation Academia Justin Y. Chen MIT justc@mit.edu Gregory Valiant Stanford University gvaliant@cs.stanford.edu Paul Valiant IAS and Purdue University pvaliant@gmail.com
Pseudocode Yes Algorithm 1 SDP Algorithm yielding π 2 -approximation to the best semilinear estimator
Open Source Code Yes our code is available at https://github. com/justc2/worst-case-randomly-collected.
Open Datasets No The paper utilizes synthetic data generated based on specified parameters (e.g., 'n = 50 elements where the ith element is included... with probability pi', 'n = 50 points is drawn uniformly from the 2D unit square'). It does not refer to or provide access information for a publicly available or open dataset in the conventional sense.
Dataset Splits No The paper describes how samples and target sets are drawn according to a known joint distribution P, but it does not specify explicit train/validation/test dataset splits with percentages, sample counts, or predefined split references for model training or evaluation.
Hardware Specification No The paper does not explicitly describe the hardware used for its experiments, such as specific GPU or CPU models.
Software Dependencies Yes our implementation of Algorithm 1 using the Python CVXPY package [9, 1] with the MOSEK solver [2]...MOSEK Optimizer API for Python 9.2.10, 2019.
Experiment Setup Yes We consider a set of n = 50 elements where the ith element is included in the sample set independently with probability pi, with p1, . . . , p25 = 0.1 and p26, ..., p50 = 0.5. The target set is the entire population, i.e. the goal is to estimate the population mean.