Rethinking Boundaries: End-To-End Recognition of Discontinuous Mentions with Pointer Networks

Authors: Hao Fei, Donghong Ji, Bobo Li, Yijiang Liu, Yafeng Ren, Fei Li12785-12793

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on the CADEC and Sh ARe13 datasets show that our model outperforms flat and hypergraph models as well as a state-of-the-art transition-based model for discontinuous NER.
Researcher Affiliation Academia 1 Key Laboratory of Aerospace Information Security and Trusted Computing, Ministry of Education, School of Cyber Science and Engineering, Wuhan University, Wuhan, China 2 Guangdong University of Foreign Studies, Guangzhou, China {hao.fei, dhji, boboli, cslyj, renyafeng}@whu.edu.cn, foxlf823@gmail.com
Pseudocode No The paper describes the model architecture and processes in text and diagrams, but does not provide structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statement or link indicating that the source code for the methodology is openly available.
Open Datasets Yes We experiment on two datasets for discontinuous NER, namely CADEC (Karimi et al. 2015) and Sh ARe13 (Pradhan et al. 2013), both of which are derived from biomedical or clinical domain documents.
Dataset Splits Yes In Table 1, we present the detailed statistics of two datasets. CADEC #Train 875 #Dev 187 #Test 188, Sh ARe13 #Train 180 #Dev 19 #Test 99.
Hardware Specification No The paper does not specify any hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions various models and optimizers like 'Transformer', 'LSTM', 'Fasttext', 'ELMo', 'Bio BERT', 'Adam optimizer' but does not specify version numbers for any software dependencies or libraries.
Experiment Setup Yes The dimensions of word embeddings, position embeddings and character representations are 300, 30 and 50 respectively. We use the 3-layer Transformer with a 768-dimension hidden size as encoder. The dimensions of all the other intermediate representations are set as 300. The kernel sizes of CNN are [3,4,5]. We adopt the Adam optimizer with an initial learning rate as 1e-4. The mini-batch size is set as 16. Moreover, the initial value of γ is set as 0.85 according to the development experiments.