Improving Neural Fine-Grained Entity Typing With Knowledge Attention

Authors: Ji Xin, Yankai Lin, Zhiyuan Liu, Maosong Sun

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results and case studies on real-world datasets demonstrate that our model significantly outperforms other state-of-the-art methods
Researcher Affiliation Academia Ji Xin,1 Yankai Lin,2 Zhiyuan Liu,2 Maosong Sun2 1Deaprtment of Physics, Tsinghua University, Beijing, China 2State Key Laboratory of Intelligent Technology and Systems Tsinghua National Laboratory for Information Science and Technology Department of Computer Science and Technology, Tsinghua University, Beijing, China {xinj14,linyk14}@mails.tsinghua.edu.cn {liuzy,sms}@tsinghua.edu.cn
Pseudocode No As illustrated in Figure 1, our model mainly consists of two parts. Firstly, we build a neural network to generate context and entity mention representations. Secondly, depending on the entity mention, we use knowledge attention to focus on important context words and improve the quality of context representation.
Open Source Code Yes Code and data for this paper can be found at https://github.com/thunlp/KNET.
Open Datasets Yes We establish two datasets for experiments: one automatically built from Wikipedia and Freebase, one manually labeled. ... Concretely, we search Wikipedia text for sentences with an anchor link which links to another Wikipedia page. The page can be further connected to a Freebase entity, whose type labels are shown in Freebase. Without loss of generality, we choose entities in FB15K when searching through Wikipedia. FB15K is a subset of Freebase containing common entities introduced in (Bordes et al. 2013).
Dataset Splits Yes Similar to (Ling and Weld 2012), we employ Wikipedia and Freebase to generate train, validation and test datasets using distant supervision (Mintz et al. 2009).
Hardware Specification No We use Adam Optimizer (Kingma and Ba 2014) and mini-batch of size B for parameter optimization. We also use implementation of Trans E from (Lin et al. 2015) to obtain entity embeddings. To avoid overfitting, we employ Dropout (Srivastava et al. 2014) on entity mention representation.
Software Dependencies No Following (Shimaoka et al. 2017), we use pre-trained word embeddings from (Pennington, Socher, and Manning 2014). We use Adam Optimizer (Kingma and Ba 2014) and mini-batch of size B for parameter optimization. We also use implementation of Trans E from (Lin et al. 2015) to obtain entity embeddings.
Experiment Setup Yes We explore different sets of hyperparameter settings and determine Adam optimizer learning rate λ among {0.01, 0.005, 0.001}, hidden-size of LSTM among {100, 150, 200}, word vector size among {50, 100, 300}, window size L among {5, 10, 15} and batch size B among {100, 500, 1,000}, based on performance on the validation set. The hyperparameter settings are shown in Table 3.