GraphER: Token-Centric Entity Resolution with Graph Convolutional Neural Networks

Authors: Bing Li, Wei Wang, Yifang Sun, Linhan Zhang, Muhammad Asif Ali, Yi Wang8172-8179

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on two real-world datasets demonstrate that our model stably outperforms state-of-the-art models.
Researcher Affiliation Academia 1School of Computer Science and Engineering, University of New South Wales, Australia {bing.li, weiw, yifang.sun, muhammadasif.ali}@unsw.edu.au, linhan.zhang@student.unsw.edu.au 2Dongguan University of Technology, China wangyi@dgut.edu.cn
Pseudocode No The paper describes the model architecture and steps but does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets Yes We used two datasets, each contains two tables, and a list of golden matches. ... Amazon-Google (K opcke, Thor, and Rahm 2010). ... Beer Advo-Rate Beer (Mudgal et al. 2018).
Dataset Splits Yes For both datasets, we use the same 3:1:1 train/dev/test split as in (Mudgal et al. 2018).
Hardware Specification Yes The Titan V used for this research was donated by the NVIDIA Corporation.
Software Dependencies No Token embeddings were initiallized using 300-dimensional pretrained vectors of Glove. While Glove is a software dependency, no specific version number is provided.
Experiment Setup Yes For ER-GCN, the size of Θ(1) was set to |V| 300, |V| was the number of nodes in corresponding ER-Graph, and the size of Θ(2) was 300 200. The textual window size was set to 20. Token embeddings were initiallized using 300-dimensional pretrained vectors of Glove1, while unknown words were initialized with an embedding drawn from a uniform distribution U( 0.25, 0.25). All the weight matrices in ER-GCN are initialized using Xavier initialization (Glorot and Bengio 2010) with gain 1. da in Eq. 10 was set to 350. For the CNN used in aggregation layer, we took three filter widths [1, 2, 3], each filter width having 150 kernels. For the final prediction layer, the number of hidden units of Highway Net is set to 4000. For optimization, we used Adam (Kinga and Adam 2015) with an initial learning rate 0.001, dropout rate as 0.5, and the gradient clipping to 5; the batch size to be 32 and 3 for Amazon-Google and Beer Advo-Rate Beer dataset, respectively; all other hyper-parameters were their default values. We trained the model for a maximum of 100 epochs, and stopped training if the validation loss did not decrease by 10 consecutive epochs.