A Supervised Multi-Head Self-Attention Network for Nested Named Entity Recognition
Authors: Yongxiu Xu, Heyan Huang, Chong Feng, Yue Hu14185-14193
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To verify the performance of our model, we conduct extensive experiments on both nested and flat datasets. The experimental results show that our model can outperform the previous state-of-the-art methods on multiple tasks without any extra NLP tools or human annotations. In this section, we conducted comprehensive experiments on five datasets among which three are nested and two are flat. |
| Researcher Affiliation | Academia | 1 Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China 2 School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China 3 School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China |
| Pseudocode | No | The paper describes the model architecture and equations in prose and mathematical notation but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code of our model will be released for future research1. 1https://github.com/xyx Ada/Attention NER |
| Open Datasets | Yes | We performed nested NER task on the widely-used ACE2004, ACE2005 and GENIA datasets and performed flat NER task on JNLPBA and Co NLL03-English datasets. The details of data statistics are summarized in Table 1. 2https://catalog.ldc.upenn.edu/ LDC2005T09 33https://catalog.ldc.upenn.edu/ LDC2006T06 4http://www.geniaproject.org/genia-corpus/term-corpus 5http://www.nactem.ac.uk/tsujii/GENIA/ ERtask/report.html |
| Dataset Splits | Yes | We reuse the same train/dev/test splits following the previous works (Lu and Roth 2015; Wang and Lu 2018). We split dataset into training, development and testing with the ratio 8.1:0.9:1. Following (Ju, Miwa, and Ananiadou 2018), we randomly select 10% sentences from training set as our development set. We split dataset into training, development and testing with the ratio 8:1:1. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU/CPU models, memory specifications). |
| Software Dependencies | No | The paper mentions software tools and models like 'Allen NLP', 'SciBERTbase', 'BERTbase', 'Glove', 'CNN', 'Bi LSTM encoder', and 'Adam optimizer', but it does not specify their version numbers. |
| Experiment Setup | Yes | For Bi LSTM encoder in both BD and EC module, the dimension of the hidden state is 200. Each linear full connection layer has one layer with 150-dimensions. To avoid over fitting, we apply 0.5 dropout for the input embeddings and 0.4 dropout for Bi LSTM encoder. During training, we use Adam optimizer with the learning rates of 5.0 10 5 and perform linear decay of the learning rate. For hyperparameters, we select δ as 0.5 with the best developing results. |