reproducibilityindex.ai

A Supervised Multi-Head Self-Attention Network for Nested Named Entity Recognition

Authors: Yongxiu Xu, Heyan Huang, Chong Feng, Yue Hu14185-14193

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To verify the performance of our model, we conduct extensive experiments on both nested and ﬂat datasets. The experimental results show that our model can outperform the previous state-of-the-art methods on multiple tasks without any extra NLP tools or human annotations. In this section, we conducted comprehensive experiments on ﬁve datasets among which three are nested and two are ﬂat.
Researcher Affiliation	Academia	1 Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China 2 School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China 3 School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
Pseudocode	No	The paper describes the model architecture and equations in prose and mathematical notation but does not include any explicit pseudocode or algorithm blocks.
Open Source Code	Yes	The code of our model will be released for future research1. 1https://github.com/xyx Ada/Attention NER
Open Datasets	Yes	We performed nested NER task on the widely-used ACE2004, ACE2005 and GENIA datasets and performed ﬂat NER task on JNLPBA and Co NLL03-English datasets. The details of data statistics are summarized in Table 1. 2https://catalog.ldc.upenn.edu/ LDC2005T09 33https://catalog.ldc.upenn.edu/ LDC2006T06 4http://www.geniaproject.org/genia-corpus/term-corpus 5http://www.nactem.ac.uk/tsujii/GENIA/ ERtask/report.html
Dataset Splits	Yes	We reuse the same train/dev/test splits following the previous works (Lu and Roth 2015; Wang and Lu 2018). We split dataset into training, development and testing with the ratio 8.1:0.9:1. Following (Ju, Miwa, and Ananiadou 2018), we randomly select 10% sentences from training set as our development set. We split dataset into training, development and testing with the ratio 8:1:1.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU/CPU models, memory specifications).
Software Dependencies	No	The paper mentions software tools and models like 'Allen NLP', 'SciBERTbase', 'BERTbase', 'Glove', 'CNN', 'Bi LSTM encoder', and 'Adam optimizer', but it does not specify their version numbers.
Experiment Setup	Yes	For Bi LSTM encoder in both BD and EC module, the dimension of the hidden state is 200. Each linear full connection layer has one layer with 150-dimensions. To avoid over ﬁtting, we apply 0.5 dropout for the input embeddings and 0.4 dropout for Bi LSTM encoder. During training, we use Adam optimizer with the learning rates of 5.0 10 5 and perform linear decay of the learning rate. For hyperparameters, we select δ as 0.5 with the best developing results.