Truthful Data Acquisition via Peer Prediction
Authors: Yiling Chen, Yiheng Shen, Shuran Zheng
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Our work is the first to consider rewarding data without (good) test data that can be used evaluate the quality of reported data. Similar to our setting, [12, 2] consider paying to multiple data providers in a machine learning task. They use a test set to assess the contribution of subsets of data and then propose a fair measurement of the value of each data point in the dataset, which is based on the Shapley value in game theory. Both of the works do not formally consider the incentive compatibility of payment allocation. [28] proposes a market framework that purchases hypotheses for a machine learning problem when the data is distributed among multiple agents. Again they assume that the market has access to some true samples and the participants are paid with their incremental contributions evaluated by these true samples. Besides, there is a small literature (see [9] and subsequent work) on aggregating datasets using scoring rules that also considers signal distributions in exponential families. |
| Researcher Affiliation | Academia | Yiling Chen Harvard University yiling@seas.harvard.edu Yiheng Shen Tsinghua University shen-yh17@mails.tsinghua.edu.cn Shuran Zheng Harvard University shuran_zheng@seas.harvard.edu |
| Pseudocode | No | The paper describes "Mechanism 1" and "Mechanism 2" with numbered steps, but these are presented as descriptive text blocks rather than formal pseudocode or algorithm environments with specific labeling like "Algorithm" or "Pseudocode". |
| Open Source Code | No | The paper does not contain any statement about releasing source code for the described methodology, nor does it provide any links to a code repository. |
| Open Datasets | No | The paper is theoretical and focuses on mechanism design rather than empirical evaluation with specific datasets. It mentions examples of data models (e.g., linear regression) but does not use named, publicly available datasets. |
| Dataset Splits | No | This paper is theoretical and does not describe empirical experiments with dataset splits. It does not provide information about training, validation, or test sets. |
| Hardware Specification | No | The paper is theoretical and does not describe any empirical experiments; therefore, it does not specify any hardware used for computations. |
| Software Dependencies | No | The paper is theoretical and does not describe any specific software dependencies or version numbers for implementation or experimentation. |
| Experiment Setup | No | The paper is theoretical and focuses on mechanism design rather than empirical evaluation. It does not describe any experimental setups, hyperparameters, or system-level training settings. |