Frustratingly Easy Truth Discovery

Authors: Reshef Meir, Ofra Amir, Omer Ben-Porat, Tsviel Ben Shabat, Gal Cohensius, Lirong Xia

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We prove that this estimates well the actual competence level and enables separating high and low quality workers in a wide spectrum of domains and statistical models. Under Gaussian noise, this simple estimate is the unique solution to the Maximum Likelihood Estimator with a constant regularization factor. Finally, weighing workers according to their average proximity in a crowdsourcing setting, results in substantial improvement over unweighted aggregation and other truth discovery algorithms in practice.
Researcher Affiliation Academia Reshef Meir1, Ofra Amir1, Omer Ben-Porat1, Tsviel Ben-Shabat1, Gal Cohensius1, Lirong Xia2 1 Technion Israel Institute of Technology 2 Rensselaer Polytechnic Institute (RPI) {reshefm, oamir, omerbp}@ie.technion.ac.il, {tsviel,galcohensius}@gmail.com, xial@cs.rpi.edu
Pseudocode Yes ALGORITHM 1: (P-TDD) FOR REAL-VALUED DATA
Open Source Code No Most proofs, as well as additional empirical results are available in the full version of the paper on ar Xiv: https://arxiv.org/abs/1905.00629. This is a link to the paper on arXiv, not the source code.
Open Datasets Yes Datasets: We used the following datasets from five different domains. We write the used distance measure in each domain in brackets. Categorical (Hamming distance): GG, DOGS, FLAGS (Shah and Zhou 2015); Predict (Mandal, Radanovic, and Parkes 2020)... Real-valued (NSED): BUILDINGS (collected for this paper); TRI (Hart et al. 2018); and EMO (Snow et al. 2008)... Language (GLEU): The TRANSL dataset contains English translations of Japanese sentences (Braylan and Lease 2020)... Outlines (Jaccard): The Etch-a-Cell dataset contains bitmaps of the outline of a tumor in 2D slices of a cell (Spiers et al. 2021).
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, or test sets. It mentions 'sampled n workers and m questions without repetition from each dataset (real or synthetic), and repeated the process at least 1000 times for every combination' which is a resampling strategy for robustness rather than a fixed split.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup No The paper describes the algorithms and their evaluation on datasets but does not specify concrete hyperparameter values, training configurations, or system-level settings for the experiments. It mentions 'sampling n workers and m questions' but no further details on the experimental setup itself.