CNN-Based Chinese NER with Lexicon Rethinking

Authors: Tao Gui, Ruotian Ma, Qi Zhang, Lujun Zhao, Yu-Gang Jiang, Xuanjing Huang

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on four datasets show that the proposed method can achieve better performance than both word-level and character-level baseline methods.
Researcher Affiliation Collaboration Tao Gui1 , Ruotian Ma1 , Qi Zhang1 , Lujun Zhao1 , Yu-Gang Jiang1,2 and Xuanjing Huang1 1School of Computer Science, Fudan University, Shanghai, China 2Jilian Technology Group (Video++), Shanghai, China
Pseudocode No No pseudocode or algorithm blocks were found.
Open Source Code Yes Our code are released at https://github.com/guitaowufeng/LR-CNN.
Open Datasets Yes We evaluate the proposed method on four datasets, including Onto Notes [Weischedel et al., 2011], MSRA [Levow, 2006], Weibo NER [Peng and Dredze, 2015; He and Sun, 2016], and Resume NER [Zhang and Yang, 2018].
Dataset Splits Yes Table 1: Statistics of datasets. Datasets Type Train Dev Test Onto Notes Sentence 15.7k 4.3k 4.3k Char 491.9k 200.5k 208.1k MSRA Sentence 46.4k 4.4k Char 2169.9k 172.6k Weibo Sentence 1.4k 0.27k 0.27k Char 73.8k 14.5 14.8k Resume Sentence 3.8k 0.46 0.48k Char 124.1k 13.9k 15.1k
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) used for experiments were provided in the paper.
Software Dependencies No The paper mentions Adamax optimization and word2vec, but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes For all four of the datasets, we used the Adamax [Kingma and Ba, 2014] optimization to train our networks. The initial learning rate was set at 0.0015, with a decay rate of 0.05. To avoid overfitting, we employed the dropout technique (50% dropout rate) on the character embeddings, lexicon embeddings and each layer of the CNNs. The character embeddings and lexicon embeddings were initialized by a pretrained embedding and then fine-tuned during the training. The character embedding size and lexicon embedding size were set to 50. For the biggest dataset, MSRA, we used five layers of CNNs with an output channel size of 300. For the other datasets, we used four layers of CNNs with an output channel size of 128. We used early stopping, based on the performance on the development set.