Truthful Data Acquisition via Peer Prediction

Authors: Yiling Chen, Yiheng Shen, Shuran Zheng

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical Our work is the first to consider rewarding data without (good) test data that can be used evaluate the quality of reported data. Similar to our setting, [12, 2] consider paying to multiple data providers in a machine learning task. They use a test set to assess the contribution of subsets of data and then propose a fair measurement of the value of each data point in the dataset, which is based on the Shapley value in game theory. Both of the works do not formally consider the incentive compatibility of payment allocation. [28] proposes a market framework that purchases hypotheses for a machine learning problem when the data is distributed among multiple agents. Again they assume that the market has access to some true samples and the participants are paid with their incremental contributions evaluated by these true samples. Besides, there is a small literature (see [9] and subsequent work) on aggregating datasets using scoring rules that also considers signal distributions in exponential families.
Researcher Affiliation Academia Yiling Chen Harvard University yiling@seas.harvard.edu Yiheng Shen Tsinghua University shen-yh17@mails.tsinghua.edu.cn Shuran Zheng Harvard University shuran_zheng@seas.harvard.edu
Pseudocode No The paper describes "Mechanism 1" and "Mechanism 2" with numbered steps, but these are presented as descriptive text blocks rather than formal pseudocode or algorithm environments with specific labeling like "Algorithm" or "Pseudocode".
Open Source Code No The paper does not contain any statement about releasing source code for the described methodology, nor does it provide any links to a code repository.
Open Datasets No The paper is theoretical and focuses on mechanism design rather than empirical evaluation with specific datasets. It mentions examples of data models (e.g., linear regression) but does not use named, publicly available datasets.
Dataset Splits No This paper is theoretical and does not describe empirical experiments with dataset splits. It does not provide information about training, validation, or test sets.
Hardware Specification No The paper is theoretical and does not describe any empirical experiments; therefore, it does not specify any hardware used for computations.
Software Dependencies No The paper is theoretical and does not describe any specific software dependencies or version numbers for implementation or experimentation.
Experiment Setup No The paper is theoretical and focuses on mechanism design rather than empirical evaluation. It does not describe any experimental setups, hyperparameters, or system-level training settings.