Mention and Entity Description Co-Attention for Entity Disambiguation

Authors: Feng Nie, Yunbo Cao, Jinpeng Wang, Chin-Yew Lin, Rong Pan

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our evaluation shows that the proposed model outperforms the state-of-the-arts on three public datasets. Further analysis also confirms that both the co-attention mechanism and the type-aware mechanism are effective.
Researcher Affiliation Collaboration 1Sun-Yat-Sen University 2Tencent Corporation, Beijing, China 3Microsoft Research Asia
Pseudocode No The paper describes the model architecture and training process in text and mathematical equations but does not include any explicitly labeled pseudocode blocks or algorithms.
Open Source Code No The paper states: "We will publish the type data for entities in the evaluation dataset." This refers to data, not the source code for the proposed methodology. There is no other statement providing concrete access to the source code of their method.
Open Datasets Yes We evaluate Type Co Att with the following three datasets. ACE (Bentivogli et al. 2010). Co NLL (Hoffart et al. 2011). KBP 2010.
Dataset Splits Yes We randomly split the new dataset into 10 folds, and then use 9 of them for model training and the remaining one for hyperparameter tuning.
Hardware Specification No The paper describes software and training parameters but does not specify any hardware details such as GPU models (e.g., NVIDIA A100), CPU types, or other specific computational resources used for the experiments.
Software Dependencies No The paper mentions using "the word2vec toolkit (Mikolov et al. 2013)", "NLTK", and the "Ada Delta optimizer (Zeiler 2012)" but does not provide specific version numbers for these software components or any other libraries required for reproducibility.
Experiment Setup Yes The dimensionality of the word embeddings is set to 300. The dimensionality of the hidden units in LSTM is set to 300. We use the stochastic gradient descent algorithm and the Ada Delta optimizer (Zeiler 2012). The parameters in LSTM are initialized using a normal distribution with a mean of 0 and a variance of 6/(din + dout). And all the other parameters for co-attention are initialized with a uniform distribution U( 0.01, 0.01).