Neural Collective Entity Linking Based on Recurrent Random Walk Network Learning
Authors: Mengge Xue, Weiming Cai, Jinsong Su, Linfeng Song, Yubin Ge, Yubao Liu, Bin Wang
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results and in-depth analysis on various datasets show that our model achieves better performance than other state-of-the-art models. |
| Researcher Affiliation | Collaboration | Mengge Xue1,2,3 , Weiming Cai1 , Jinsong Su1 , Linfeng Song4 , Yubin Ge4 , Yubao Liu4 and Bin Wang5 1Xiamen University 2Institute of Information Engineering, Chinese Academy of Sciences 3School of Cyber Security, University of Chinese Academy of Sciences 4Rochester University 5Xiaomi AI Lab, Xiaomi Inc., Beijing, China |
| Pseudocode | No | The paper describes the model algorithm in prose and mathematical equations but does not include structured pseudocode or an algorithm block. |
| Open Source Code | Yes | Our code and data are released at https://github.com/DeepLearnXMU/RRWEL. |
| Open Datasets | Yes | We validated our proposed model on six different benchmark datasets used by previous studies: AIDA-CONLL: This dataset is a manually annotated EL dataset [Hoffart et al., 2011]. It consists of AIDAtrain for training, AIDA-A for validation and AIDA-B for testing and there are totally 946, 216, 231 documents respectively. MSNBC, AQUAINT, ACE2004: These datasets are cleaned and updated by Guo and Barbosa [2018], which contain 20, 50 and 36 documents respectively. WNED-WIKI (WIKI), WNED-CWEB (CWEB): These datasets are automatically extracted from Clue Web and Wikipedia in [Guo and Barbosa, 2018; Gabrilovich et al., 2013] and are relatively large with 320 documents each. |
| Dataset Splits | Yes | It consists of AIDAtrain for training, AIDA-A for validation and AIDA-B for testing and there are totally 946, 216, 231 documents respectively. In our experiments, we investigated the system performance with AIDA-train for training and AIDA-A for validation, and then tested on AIDA-B and other datasets. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments (e.g., GPU/CPU models, memory). |
| Software Dependencies | No | The paper mentions 'Word2Vec toolkit' and 'CNNs' but does not specify version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | To employ CNN to learn the distributed representations of inputs, we used 64 filters with the window size 3 for the convolution operation and the non-linear transformation function Re LU. Meanwhile, to learn the context representations of input mentions and target entities, we directly followed Francis-Landau [2016] to utilize the window size 10 for context, and only extracted the first 100 words in the documents for mentions and entities. Besides, the standard Word2Vec toolkit [Mikolov et al., 2013] was used with vector dimension size 300, window context size 21, negative sample number 10 and iteration number 10 to pre-train word embeddings on Wikipedia and then we fine-tuned them during model training. Particularly, following Ganea and Hofmann [2017], we kept top 7 candidates for each mention based on their prior probabilities, and while propagating the evidence, we just kept top 4 candidates for each mention mi according to plocal( |mi). Besides, we set α=1e 5. Finally, we adopted standard F1 score at Mention level (Micro) as measurement. To optimize this objective function, we employ the stochastic gradient descent (SGD) with diagonal variant of Ada Grad in [Duchi et al., 2011]. |