A Sequence-to-Set Network for Nested Named Entity Recognition

Authors: Zeqi Tan, Yongliang Shen, Shuai Zhang, Weiming Lu, Yueting Zhuang

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that our proposed model achieves state-of-the-art on three nested NER corpora: ACE 2004, ACE 2005 and KBP 2017.
Researcher Affiliation Academia Zeqi Tan , Yongliang Shen , Shuai Zhang , Weiming Lu , Yueting Zhuang College of Computer Science and Technology, Zhejiang University {zqtan, syl, zsss, luwm, yzhuang}@zju.edu.cn
Pseudocode No The paper describes the model architecture and components in text and with a diagram (Figure 2), but it does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes The code is available at https://github.com/zqtan1024/sequence-to-set.
Open Datasets Yes In the experimental setup, we utilize the widely used ACE 2004, ACE 2005, KBP 2017 and GENIA datasets. For ACE 2004 and ACE 2005, we follow the previous work [Katiyar and Cardie, 2018; Lin et al., 2019] to keep files from bn, nw and wl and divide these files into train, dev and test sets in the ratio of 8:1:1, respectively. For GENIA, we use geniacorpus3.02 as in Katiyar and Cardie [2018]. For KBP 2017, we use the 2017 English evaluation dataset and the same split strategy in Lin et al. [2019].
Dataset Splits Yes For ACE 2004 and ACE 2005, we follow the previous work [Katiyar and Cardie, 2018; Lin et al., 2019] to keep files from bn, nw and wl and divide these files into train, dev and test sets in the ratio of 8:1:1, respectively. The checkpoint that has the best F1 score in the development set is chosen to evaluate the test set.
Hardware Specification No The paper does not provide any specific details about the hardware used for the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions using BERT, GloVE, Bio BERT, and Bio Wordvec, but it does not specify the version numbers of the software frameworks (e.g., PyTorch, TensorFlow) or libraries used for implementation, nor does it specify a Python version or other ancillary software versions.
Experiment Setup Yes The number of entity queries N is set to 60 and the vectors are randomly initialized with the normal distribution N(0.0, 0.02). The number of our decoder layer M is set to 3. The number of the attention heads is set to 8. The number of the MLP layer is set to 1. We use the Adam W [Loshchilov and Hutter, 2017] optimizer with a linear warmup-decay learning rate schedule (with peak learning rate of 2e-5 and epoch of 100), a dropout with the rate of 0.1 and a batch size of 8.