reproducibilityindex.ai

Worst-Case Analysis for Randomly Collected Data

Authors: Justin Chen, Gregory Valiant, Paul Valiant

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experimentally demonstrate the beneﬁt of this framework and our algorithm in comparison to standard estimators, for several such settings.
Researcher Affiliation	Academia	Justin Y. Chen MIT justc@mit.edu Gregory Valiant Stanford University gvaliant@cs.stanford.edu Paul Valiant IAS and Purdue University pvaliant@gmail.com
Pseudocode	Yes	Algorithm 1 SDP Algorithm yielding π 2 -approximation to the best semilinear estimator
Open Source Code	Yes	our code is available at https://github. com/justc2/worst-case-randomly-collected.
Open Datasets	No	The paper utilizes synthetic data generated based on specified parameters (e.g., 'n = 50 elements where the ith element is included... with probability pi', 'n = 50 points is drawn uniformly from the 2D unit square'). It does not refer to or provide access information for a publicly available or open dataset in the conventional sense.
Dataset Splits	No	The paper describes how samples and target sets are drawn according to a known joint distribution P, but it does not specify explicit train/validation/test dataset splits with percentages, sample counts, or predefined split references for model training or evaluation.
Hardware Specification	No	The paper does not explicitly describe the hardware used for its experiments, such as specific GPU or CPU models.
Software Dependencies	Yes	our implementation of Algorithm 1 using the Python CVXPY package [9, 1] with the MOSEK solver [2]...MOSEK Optimizer API for Python 9.2.10, 2019.
Experiment Setup	Yes	We consider a set of n = 50 elements where the ith element is included in the sample set independently with probability pi, with p1, . . . , p25 = 0.1 and p26, ..., p50 = 0.5. The target set is the entire population, i.e. the goal is to estimate the population mean.