reproducibilityindex.ai

Hierarchical Matching Network for Heterogeneous Entity Resolution

Authors: Cheng Fu, Xianpei Han, Jiaming He, Le Sun

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our approach on ten datasets. Experimental results show that, by adaptively selecting cross-attribute matching objects for tokens and effectively identifying important information of each attribute, our method signiﬁcantly outperforms previous methods on all three kinds of datasets (homogeneous, heterogeneous and dirty).
Researcher Affiliation	Academia	Cheng Fu1,3 , Xianpei Han1,2 , Jiaming He4 and Le Sun1,2 1Chinese Information Processing Laboratory, Institute of Software, Chinese Academy of Sciences 2State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences 3University of Chinese Academy of Sciences 4Brandeis University
Pseudocode	No	No structured pseudocode or algorithm blocks were found in the paper. The methodology is described in text and mathematical formulas.
Open Source Code	Yes	Our code is freely available online1. 1https://github.com/cipnlu/Entity Matcher
Open Datasets	Yes	We conduct experiments on ten datasets of three types, whose statistics are shown in Table 1: Four homogeneous datasets: Walmart-Amazon1, Amazon-Google, DBLP-ACM1, and DBLP-Scholar1, which are commonly used real-world datasets. Because this paper focuses on entity matching, we use their afterblocking versions provided by Mudgal et al. [2018]. Three dirty datasets: Walmart-Amazon2, DBLP-ACM2, and DBLP-Scholar2, which are individually derived from Walmart-Amazon1, DBLP-ACM1, and DBLP-Scholar1 by randomly moving the value of each attribute to attribute title in the same tuple with 50% probability. These datasets are also provided by Mudgal et al. [2018].
Dataset Splits	Yes	For model learning, we use the same 60%/20%/20% train/dev/test split as in [Mudgal et al., 2018]
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory, or detailed computer specifications) used for running the experiments were provided.
Software Dependencies	No	No specific software dependencies with version numbers (e.g., Python 3.x, PyTorch x.y.z, TensorFlow x.y.z) were found. The paper mentions using "Fast Text 300-dimensional word embedding" and "Adam algorithm for optimization" but without version details.
Experiment Setup	Yes	The hidden size of each GRU layer is set 150. For model learning, we use the same 60%/20%/20% train/dev/test split as in [Mudgal et al., 2018], and use Adam algorithm for optimization.