Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Sequence-to-Set Network for Nested Named Entity Recognition

Authors: Zeqi Tan, Yongliang Shen, Shuai Zhang, Weiming Lu, Yueting Zhuang

IJCAI 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that our proposed model achieves state-of-the-art on three nested NER corpora: ACE 2004, ACE 2005 and KBP 2017.
Researcher Affiliation Academia Zeqi Tan , Yongliang Shen , Shuai Zhang , Weiming Lu , Yueting Zhuang College of Computer Science and Technology, Zhejiang University EMAIL
Pseudocode No The paper describes the model architecture and components in text and with a diagram (Figure 2), but it does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes The code is available at https://github.com/zqtan1024/sequence-to-set.
Open Datasets Yes In the experimental setup, we utilize the widely used ACE 2004, ACE 2005, KBP 2017 and GENIA datasets. For ACE 2004 and ACE 2005, we follow the previous work [Katiyar and Cardie, 2018; Lin et al., 2019] to keep files from bn, nw and wl and divide these files into train, dev and test sets in the ratio of 8:1:1, respectively. For GENIA, we use geniacorpus3.02 as in Katiyar and Cardie [2018]. For KBP 2017, we use the 2017 English evaluation dataset and the same split strategy in Lin et al. [2019].
Dataset Splits Yes For ACE 2004 and ACE 2005, we follow the previous work [Katiyar and Cardie, 2018; Lin et al., 2019] to keep files from bn, nw and wl and divide these files into train, dev and test sets in the ratio of 8:1:1, respectively. The checkpoint that has the best F1 score in the development set is chosen to evaluate the test set.
Hardware Specification No The paper does not provide any specific details about the hardware used for the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions using BERT, GloVE, Bio BERT, and Bio Wordvec, but it does not specify the version numbers of the software frameworks (e.g., PyTorch, TensorFlow) or libraries used for implementation, nor does it specify a Python version or other ancillary software versions.
Experiment Setup Yes The number of entity queries N is set to 60 and the vectors are randomly initialized with the normal distribution N(0.0, 0.02). The number of our decoder layer M is set to 3. The number of the attention heads is set to 8. The number of the MLP layer is set to 1. We use the Adam W [Loshchilov and Hutter, 2017] optimizer with a linear warmup-decay learning rate schedule (with peak learning rate of 2e-5 and epoch of 100), a dropout with the rate of 0.1 and a batch size of 8.