Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Worst-Case Analysis for Randomly Collected Data
Authors: Justin Chen, Gregory Valiant, Paul Valiant
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally demonstrate the beneο¬t of this framework and our algorithm in comparison to standard estimators, for several such settings. |
| Researcher Affiliation | Academia | Justin Y. Chen MIT EMAIL Gregory Valiant Stanford University EMAIL Paul Valiant IAS and Purdue University EMAIL |
| Pseudocode | Yes | Algorithm 1 SDP Algorithm yielding Ο 2 -approximation to the best semilinear estimator |
| Open Source Code | Yes | our code is available at https://github. com/justc2/worst-case-randomly-collected. |
| Open Datasets | No | The paper utilizes synthetic data generated based on specified parameters (e.g., 'n = 50 elements where the ith element is included... with probability pi', 'n = 50 points is drawn uniformly from the 2D unit square'). It does not refer to or provide access information for a publicly available or open dataset in the conventional sense. |
| Dataset Splits | No | The paper describes how samples and target sets are drawn according to a known joint distribution P, but it does not specify explicit train/validation/test dataset splits with percentages, sample counts, or predefined split references for model training or evaluation. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used for its experiments, such as specific GPU or CPU models. |
| Software Dependencies | Yes | our implementation of Algorithm 1 using the Python CVXPY package [9, 1] with the MOSEK solver [2]...MOSEK Optimizer API for Python 9.2.10, 2019. |
| Experiment Setup | Yes | We consider a set of n = 50 elements where the ith element is included in the sample set independently with probability pi, with p1, . . . , p25 = 0.1 and p26, ..., p50 = 0.5. The target set is the entire population, i.e. the goal is to estimate the population mean. |