Entity Alignment with Noisy Annotations from Large Language Models
Authors: Shengyuan Chen, Qinggang Zhang, Junnan Dong, Wen Hua, Qing Li, Xiao Huang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the advantages of LLM4EA on four benchmark datasets in terms of effectiveness, robustness, and efficiency. |
| Researcher Affiliation | Academia | Shengyuan Chen Department of Computing The Hong Kong Polytechnic University Hung Hom, Hong Kong SAR shengyuan.chen@connect.polyu.hk Qinggang Zhang Department of Computing The Hong Kong Polytechnic University Hung Hom, Hong Kong SAR qinggangg.zhang@connect.polyu.hk Junnan Dong Department of Computing The Hong Kong Polytechnic University Hung Hom, Hong Kong SAR hanson.dong@connect.polyu.hk Wen Hua Department of Computing The Hong Kong Polytechnic University Hung Hom, Hong Kong SAR wency.hua@polyu.edu.hk Qing Li Department of Computing The Hong Kong Polytechnic University Hung Hom, Hong Kong SAR csqli@comp.polyu.edu.hk Xiao Huang Department of Computing The Hong Kong Polytechnic University Hung Hom, Hong Kong SAR xiaohuang@comp.polyu.edu.hk |
| Pseudocode | Yes | Algorithm 1 The greedy label refinement algorithm |
| Open Source Code | Yes | We have provided the code for the framework, accessible via this URL: https://github.com/chensyCN/llm4ea_official. |
| Open Datasets | Yes | In this study, we use the widely-adopted Open EA dataset (Sun et al., 2020), including two monolingual datasets (D-W-15K and D-Y-15K) and two cross-lingual datasets (ENDE-15K and EN-FR-15K). Open EA comes in two versions: "V1" the normal version, and "V2" the dense version. We employ "V2" in the experiments in the main text. |
| Dataset Splits | No | The paper mentions training, but does not explicitly state training, validation, and test splits with percentages or counts. It refers to standard datasets but not their specific splits. |
| Hardware Specification | Yes | Our experiments were conducted on a server equipped with six NVIDIA Ge Force RTX 3090 GPUs, 48 Intel(R) Xeon(R) Silver 4214R CPUs, and 376GB of host memory. |
| Software Dependencies | Yes | The details of the software packages used in our experiments are listed in Table 4. Table 4: Package configurations of our experiments. Package tqdm numpy scipy tensorflow keras openai Version 4.66.2 1.24.4 1.10.1 2.7.0 2.7.0 1.30.1 |
| Experiment Setup | Yes | Setup of LLM4EA. We employ GPT-3.5 as the default LLM due to its cost efficiency. Other parameters are n = 3, nlr, k = 20, δ0 = 0.5, δ1 = 0.9. |