Adversarial Transfer for Named Entity Boundary Detection with Pointer Networks

Authors: Jing Li, Deheng Ye, Shuo Shang

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct Formal Text Formal Text, Formal Text Informal Text and ablation evaluations on five benchmark datasets. Experimental results show that AT-BDRY achieves state-of-the-art transferring performance against recent baselines. and 4 Experiments
Researcher Affiliation Industry 1Inception Institute of Artificial Intelligence, Abu Dhabi, United Arab Emirates 2Tencent AI Lab, Shenzhen, China
Pseudocode No The paper presents architectural diagrams and equations (e.g., Figure 1, Figure 2, equations 1-11) but does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statements about open-sourcing the code for the described methodology, nor does it include a link to a code repository.
Open Datasets Yes We use five popular benchmark datasets to ascertain the effectiveness of AT-BDRY. Because our task is boundary detection, we ignore entity types in all datasets. The statistics of the datasets are reported in Table 1. Co NLL03, Onto Notes5.0 and Wiki Gold are formal text. WNUT16 and WNUT17 are informal text.
Dataset Splits Yes We randomly leave out 20% of training set, and combine it with development set as annotated target-domain data for these three baselines. and Table 1: Statistics of datasets. Dataset # Sentences #Mentions Train Dev Test Co NLL03 14,987 3,466 3,684 34,841 Onto Notes5.0 59,917 8,528 8,262 71,031 Wiki Gold 144,342 1,696 298,961 WNUT16 2,394 1,000 3,856 5,630 WNUT17 3,394 1,009 1,287 3,890
Hardware Specification Yes All neural network models are implemented with Py Torch framework and evaluated on NVIDIA Tesla V100 GPU.
Software Dependencies No All neural network models are implemented with Py Torch framework and evaluated on NVIDIA Tesla V100 GPU.
Experiment Setup Yes For all neural network models, we use Glo Ve 300-dimensional pre-trained word embeddings released by Stanford, which are fine-tuned during training. The dimension of character-level representation is 100 and the CNN sliding windows of filters are [2, 3, 4, 5]. The total number of CNN filters is 100. Each bidirectional encoder GRU has a depth of 3 and hidden size of 128. Each decoder GRU has a depth of 3 and hidden size of 256. Note that the encoder GRU is bidirectional and the decoder GRU is unidirectional in our model. Thus, the decoder has twice the hidden size of the encoder. The Adam optimizer was adopted with a learning rate of 0.001, selected from {0.01, 0.001, 0.0001}. We use a dropout of 0.5 after the convolution or recurrent layers. The decay rate is 0.09 and the gradient clip is 5.0.