Hierarchical Matching Network for Heterogeneous Entity Resolution

Authors: Cheng Fu, Xianpei Han, Jiaming He, Le Sun

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our approach on ten datasets. Experimental results show that, by adaptively selecting cross-attribute matching objects for tokens and effectively identifying important information of each attribute, our method significantly outperforms previous methods on all three kinds of datasets (homogeneous, heterogeneous and dirty).
Researcher Affiliation Academia Cheng Fu1,3 , Xianpei Han1,2 , Jiaming He4 and Le Sun1,2 1Chinese Information Processing Laboratory, Institute of Software, Chinese Academy of Sciences 2State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences 3University of Chinese Academy of Sciences 4Brandeis University
Pseudocode No No structured pseudocode or algorithm blocks were found in the paper. The methodology is described in text and mathematical formulas.
Open Source Code Yes Our code is freely available online1. 1https://github.com/cipnlu/Entity Matcher
Open Datasets Yes We conduct experiments on ten datasets of three types, whose statistics are shown in Table 1: Four homogeneous datasets: Walmart-Amazon1, Amazon-Google, DBLP-ACM1, and DBLP-Scholar1, which are commonly used real-world datasets. Because this paper focuses on entity matching, we use their afterblocking versions provided by Mudgal et al. [2018]. Three dirty datasets: Walmart-Amazon2, DBLP-ACM2, and DBLP-Scholar2, which are individually derived from Walmart-Amazon1, DBLP-ACM1, and DBLP-Scholar1 by randomly moving the value of each attribute to attribute title in the same tuple with 50% probability. These datasets are also provided by Mudgal et al. [2018].
Dataset Splits Yes For model learning, we use the same 60%/20%/20% train/dev/test split as in [Mudgal et al., 2018]
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory, or detailed computer specifications) used for running the experiments were provided.
Software Dependencies No No specific software dependencies with version numbers (e.g., Python 3.x, PyTorch x.y.z, TensorFlow x.y.z) were found. The paper mentions using "Fast Text 300-dimensional word embedding" and "Adam algorithm for optimization" but without version details.
Experiment Setup Yes The hidden size of each GRU layer is set 150. For model learning, we use the same 60%/20%/20% train/dev/test split as in [Mudgal et al., 2018], and use Adam algorithm for optimization.