Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Selling Data To a Machine Learner: Pricing via Costly Signaling

Authors: Junjie Chen, Minming Li, Haifeng Xu

ICML 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we use Example 3.3 to give a sense about the above finding and how the revenue (i.e., (7)) and the upper bound (i.e., RD(t)) change w.r.t. the shared quantity t. The curves of Example 3.3 are plotted in Figure 2.
Researcher Affiliation Academia 1Department of Computer Science, City University of Hong Kong, Hong Kong, China 2Department of Computer Science, University of Chicago, Chicago, Illinois, USA (work done while this author is at UVA).
Pseudocode No The paper does not contain any pseudocode blocks or algorithms labeled as such.
Open Source Code No The paper does not provide any links to open-source code or explicitly state that code for their methodology is being released.
Open Datasets No The paper describes simulated examples (e.g., "Example 3.3", "Example J.1") with defined parameters like quantity of data, accuracy representation, prior belief, and accuracy distribution. However, it does not use or provide concrete access information for a publicly available dataset.
Dataset Splits No The paper defines parameters for computational examples and simulations (e.g., Example 3.3, J.1, J.2) but does not mention dataset splits such as training, validation, or test sets.
Hardware Specification No The paper does not specify any hardware used for running the computations or simulations.
Software Dependencies No The paper does not list specific software dependencies with version numbers.
Experiment Setup Yes Example 3.3: Let t = 0%, 1%, . . . 100% be the quantity of data. Let r {0, 1, 2, . . . , 10} represent 0%, 10%, . . . 100% accuracy, q {0, 1, 2, . . . , 10} and private type b {1, 2, . . . , 10}. According to the above characterization, let the valuation function be... The prior belief ยต(q) over q is a Gaussian with standard deviation ฯƒ = 3 and mean m = 3. The accuracy distribution ฮป(r|q, t) is also a Gaussian with m = round(q t) and ฯƒ = 0.1 ( (t 0.5)2 + 0.25). Let ฯƒ = 0 if q = 0. ยต(q) and ฮป(r|q, t) will be normalized to a probability measure.