A Sequence-to-Set Network for Nested Named Entity Recognition
Authors: Zeqi Tan, Yongliang Shen, Shuai Zhang, Weiming Lu, Yueting Zhuang
IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that our proposed model achieves state-of-the-art on three nested NER corpora: ACE 2004, ACE 2005 and KBP 2017. |
| Researcher Affiliation | Academia | Zeqi Tan , Yongliang Shen , Shuai Zhang , Weiming Lu , Yueting Zhuang College of Computer Science and Technology, Zhejiang University {zqtan, syl, zsss, luwm, yzhuang}@zju.edu.cn |
| Pseudocode | No | The paper describes the model architecture and components in text and with a diagram (Figure 2), but it does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available at https://github.com/zqtan1024/sequence-to-set. |
| Open Datasets | Yes | In the experimental setup, we utilize the widely used ACE 2004, ACE 2005, KBP 2017 and GENIA datasets. For ACE 2004 and ACE 2005, we follow the previous work [Katiyar and Cardie, 2018; Lin et al., 2019] to keep files from bn, nw and wl and divide these files into train, dev and test sets in the ratio of 8:1:1, respectively. For GENIA, we use geniacorpus3.02 as in Katiyar and Cardie [2018]. For KBP 2017, we use the 2017 English evaluation dataset and the same split strategy in Lin et al. [2019]. |
| Dataset Splits | Yes | For ACE 2004 and ACE 2005, we follow the previous work [Katiyar and Cardie, 2018; Lin et al., 2019] to keep files from bn, nw and wl and divide these files into train, dev and test sets in the ratio of 8:1:1, respectively. The checkpoint that has the best F1 score in the development set is chosen to evaluate the test set. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for the experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions using BERT, GloVE, Bio BERT, and Bio Wordvec, but it does not specify the version numbers of the software frameworks (e.g., PyTorch, TensorFlow) or libraries used for implementation, nor does it specify a Python version or other ancillary software versions. |
| Experiment Setup | Yes | The number of entity queries N is set to 60 and the vectors are randomly initialized with the normal distribution N(0.0, 0.02). The number of our decoder layer M is set to 3. The number of the attention heads is set to 8. The number of the MLP layer is set to 1. We use the Adam W [Loshchilov and Hutter, 2017] optimizer with a linear warmup-decay learning rate schedule (with peak learning rate of 2e-5 and epoch of 100), a dropout with the rate of 0.1 and a batch size of 8. |