reproducibilityindex.ai

Improving the Efficiency and Effectiveness for BERT-based Entity Resolution

Authors: Bing Li, Yukai Miao, Yaoshu Wang, Yifang Sun, Wei Wang13226-13233

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on multiple datasets demonstrate that our model signiﬁcantly outperforms state-of-the-art models (including BERT) in both efﬁciency and effectiveness.
Researcher Affiliation	Academia	Bing Li1, Yukai Miao1, Yaoshu Wang2, Yifang Sun3, Wei Wang4,1 1School of Computer Science and Engineering, University of New South Wales, Australia 2Shenzhen Institute of Computing Sciences, Shenzhen University, China 3School of Computer Science and Engineering, Northeastern University, China 4Dongguan University of Technology, China
Pseudocode	No	The paper describes the model architecture and components in text and diagrams, but does not contain any pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide a specific link or explicit statement about releasing the source code for the BERT-ER model developed in this paper.
Open Datasets	Yes	We used four widely used benchmark datasets covering diverse domains such as products, music, and scholar. Table 1 lists some statistics (the ﬁrst four datasets). Each dataset contains a list of after-blocking tuple pair followed with gold labels. All datasets have been split into train/dev/test subsets by Mudgal et al. (2018). ... 2http://pages.cs.wisc.edu/ anhai/data1/deepmatcher data/ 3https://sites.google.com/site/anhaidgroup/useful-stuff/data 4https://dbs.uni-leipzig.de/research/projects/object matching/ benchmark datasets for entity resolution
Dataset Splits	Yes	All datasets have been split into train/dev/test subsets by Mudgal et al. (2018).
Hardware Specification	Yes	Our model was implemented using Pytorch 1.4 with Python 3.7, and ran on an Nvidia Titan V GPU. ... The Titan V used for this research was donated by the NVIDIA Corporation.
Software Dependencies	Yes	Our model was implemented using Pytorch 1.4 with Python 3.7... We used the popular transformers5 library for the pre-trained BERT model.
Experiment Setup	Yes	The BERT was initialized using a standard BERTBASE model. ... padded the tokenized sequences to a max length of 120. ... The hash bits k and tolerance threshold q were set to 8 and 1. ... m was set to 2k = 16, and the regularizer weight γ was 0.01. We undersampled negative instances to yield a 1:10 pos/neg rate. ... Kernel sizes of the convolutional layer was set to [1, 2, 3], each has c = 128 kernels. The balance weight α = 0.2. ... we used Adam W ... with an initial learning rate 10 5, eps 10 8, and the gradient clipping 5; the batch size is set to 32; all other hyper-parameters were their default values. In each round, the model was run 10 times with a maximum of 50 epochs.