Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
A Sequence-to-Set Network for Nested Named Entity Recognition
Authors: Zeqi Tan, Yongliang Shen, Shuai Zhang, Weiming Lu, Yueting Zhuang
IJCAI 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that our proposed model achieves state-of-the-art on three nested NER corpora: ACE 2004, ACE 2005 and KBP 2017. |
| Researcher Affiliation | Academia | Zeqi Tan , Yongliang Shen , Shuai Zhang , Weiming Lu , Yueting Zhuang College of Computer Science and Technology, Zhejiang University EMAIL |
| Pseudocode | No | The paper describes the model architecture and components in text and with a diagram (Figure 2), but it does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available at https://github.com/zqtan1024/sequence-to-set. |
| Open Datasets | Yes | In the experimental setup, we utilize the widely used ACE 2004, ACE 2005, KBP 2017 and GENIA datasets. For ACE 2004 and ACE 2005, we follow the previous work [Katiyar and Cardie, 2018; Lin et al., 2019] to keep files from bn, nw and wl and divide these files into train, dev and test sets in the ratio of 8:1:1, respectively. For GENIA, we use geniacorpus3.02 as in Katiyar and Cardie [2018]. For KBP 2017, we use the 2017 English evaluation dataset and the same split strategy in Lin et al. [2019]. |
| Dataset Splits | Yes | For ACE 2004 and ACE 2005, we follow the previous work [Katiyar and Cardie, 2018; Lin et al., 2019] to keep files from bn, nw and wl and divide these files into train, dev and test sets in the ratio of 8:1:1, respectively. The checkpoint that has the best F1 score in the development set is chosen to evaluate the test set. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for the experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions using BERT, GloVE, Bio BERT, and Bio Wordvec, but it does not specify the version numbers of the software frameworks (e.g., PyTorch, TensorFlow) or libraries used for implementation, nor does it specify a Python version or other ancillary software versions. |
| Experiment Setup | Yes | The number of entity queries N is set to 60 and the vectors are randomly initialized with the normal distribution N(0.0, 0.02). The number of our decoder layer M is set to 3. The number of the attention heads is set to 8. The number of the MLP layer is set to 1. We use the Adam W [Loshchilov and Hutter, 2017] optimizer with a linear warmup-decay learning rate schedule (with peak learning rate of 2e-5 and epoch of 100), a dropout with the rate of 0.1 and a batch size of 8. |