Hierarchical Matching Network for Heterogeneous Entity Resolution
Authors: Cheng Fu, Xianpei Han, Jiaming He, Le Sun
IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our approach on ten datasets. Experimental results show that, by adaptively selecting cross-attribute matching objects for tokens and effectively identifying important information of each attribute, our method significantly outperforms previous methods on all three kinds of datasets (homogeneous, heterogeneous and dirty). |
| Researcher Affiliation | Academia | Cheng Fu1,3 , Xianpei Han1,2 , Jiaming He4 and Le Sun1,2 1Chinese Information Processing Laboratory, Institute of Software, Chinese Academy of Sciences 2State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences 3University of Chinese Academy of Sciences 4Brandeis University |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found in the paper. The methodology is described in text and mathematical formulas. |
| Open Source Code | Yes | Our code is freely available online1. 1https://github.com/cipnlu/Entity Matcher |
| Open Datasets | Yes | We conduct experiments on ten datasets of three types, whose statistics are shown in Table 1: Four homogeneous datasets: Walmart-Amazon1, Amazon-Google, DBLP-ACM1, and DBLP-Scholar1, which are commonly used real-world datasets. Because this paper focuses on entity matching, we use their afterblocking versions provided by Mudgal et al. [2018]. Three dirty datasets: Walmart-Amazon2, DBLP-ACM2, and DBLP-Scholar2, which are individually derived from Walmart-Amazon1, DBLP-ACM1, and DBLP-Scholar1 by randomly moving the value of each attribute to attribute title in the same tuple with 50% probability. These datasets are also provided by Mudgal et al. [2018]. |
| Dataset Splits | Yes | For model learning, we use the same 60%/20%/20% train/dev/test split as in [Mudgal et al., 2018] |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory, or detailed computer specifications) used for running the experiments were provided. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., Python 3.x, PyTorch x.y.z, TensorFlow x.y.z) were found. The paper mentions using "Fast Text 300-dimensional word embedding" and "Adam algorithm for optimization" but without version details. |
| Experiment Setup | Yes | The hidden size of each GRU layer is set 150. For model learning, we use the same 60%/20%/20% train/dev/test split as in [Mudgal et al., 2018], and use Adam algorithm for optimization. |