Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Labeling without Seeing? Blind Annotation for Privacy-Preserving Entity Resolution
Authors: Yixiang Yao, Weizhao Jin, Srivatsan Ravi
TMLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we conduct experiments to empirically evaluate the feasibility of using blind annotation to annotate datasets and the incurred overhead of homomorphic encryption. In general, domain oracles are asked to annotate datasets using blind annotation. The quality of the final annotation results is assessed by comparing them to the ground truth. ... We use precision, recall and F-measure as the standard evaluation metrics to measure the accuracy of blind annotation against the scores computed from ground truth labels. |
| Researcher Affiliation | Academia | Yixiang Yao EMAIL Department of Computer Science University of Southern California |
| Pseudocode | Yes | Algorithm 1: Commonly-used Functions ... Algorithm 2: Functions using Bin FHE scheme |
| Open Source Code | No | The underlying HE program is implemented in Open FHE (Al Badawi et al., 2022), an open-source project that efficiently and extensibly implements the post-quantum Fully Homomorphic Encryption schemes. |
| Open Datasets | Yes | We use the real-world entity resolution benchmark (Kรถpcke et al., 2010), which includes 4 tasks and lies in both e-commerce and bibliographic domains. ... One such synthetic dataset is Febrl (Freely Extensible Biomedical Record Linkage) (Christen, 2008), which is widely employed for generating census records containing fields such as name, sex, age, and address. |
| Dataset Splits | Yes | Specifically, we first randomly sample 50 labeled matches from each provided ground truth, and this covers at most 5% (50 records) of each dataset because one record could link to multiple records. Note that 50 records from each dataset are around 2.5-5% of the original dataset except for Scholar. |
| Hardware Specification | Yes | All the experiments are conducted on a Linux machine with an 8-core CPU @ 3.60 GHz and 32 GB RAM. |
| Software Dependencies | No | Specifically, the web GUI is implemented in Python, in which the DSL syntax is written in Extended Backus Naur Form (EBNF) and parsed by Lark library 2 with Look-Ahead Left-to-Right (LALR) parser. The underlying HE program is implemented in Open FHE (Al Badawi et al., 2022)... The latter employs Open MP 4, a multi-platform shared-memory parallel programming library... |
| Experiment Setup | Yes | Specifically, we set it to work in public-key encryption mode with the crypto context to be STD128, which guarantees more than 128 bits of security for classical computer attacks. ... When t rounds have finished, the protocol ends: the pairs whose value is true (consensus archived) in F are added to the final ground truth, and others are discarded. Therefore, the ground truth set G is constructed as G = {(i, j, l) | (i, j, l) Gt, F(i, j) = true}. Note that increasing t tends to improve performance, but it also raises the labeling cost. An empirical analysis of the effect of t is presented in Section 5.2. |