Dynamic Modeling Cross- and Self-Lattice Attention Network for Chinese NER
Authors: Shan Zhao, Minghao Hu, Zhiping Cai, Haiwen Chen, Fang Liu14515-14523
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on four Chinese NER datasets show that DCSAN obtains stateof-the-art results as well as efficiency compared to several competitive approaches. |
| Researcher Affiliation | Academia | 1College of Computer, National University of Defense Technology, Changsha, China 2Information Research Center of Military Science, PLA Academy of Military Science, Beijing, China 3School of Design, Hunan University, Changsha, Hunan |
| Pseudocode | No | The paper describes the proposed model and its components in detail using text and mathematical equations, but it does not include a separate pseudocode block or algorithm figure. |
| Open Source Code | Yes | We will release the source code to facilitate future research in this field. 1https://github.com/zs50910/DCSAN-for-Chinese-NER |
| Open Datasets | Yes | We conduct experiments on four datasets, including Weibo NER (Peng and Dredze 2015), MSRA (Levow 2006), Chinese resume dataset (Zhang and Yang 2018), and E-commerce NER (Ding et al. 2019). |
| Dataset Splits | Yes | Table 1: Statistics of four Chinese NER datasets. Dataset Type Train Dev Test Weibo Char 73.8K 14.5K 14.8K E-commerce Char 119.1K 14.9K 14.7K Resume Char 124.1K 139K 15.1K MSRA Char 2169.9K 172.6K |
| Hardware Specification | No | The paper mentions 'GPU parallelism' and 'GPU' in general terms but does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for the experiments. |
| Software Dependencies | No | The paper mentions using BERT embeddings and a word embedding dictionary, but it does not specify version numbers for any software dependencies or libraries (e.g., Python, PyTorch, TensorFlow, CUDA). |
| Experiment Setup | Yes | As for hyper-parameter configurations, the sizes of character embeddings is 768 and word embeddings is 200 by default, and the dimensionality of hidden size is 768. For attention settings, the head number of cross-lattice attention and dynamic self-lattice attention are 8 and 4 respectively for all datasets. We set the number of self-lattice attention layers l as 2 by default. To avoid overfitting, we regularize our network using dropout with a rate tuned on the development set. To train the model, we use SGD optimizer with a learning rate of 0.0007 on Resume, MSRA, and E-commerce datasets and 0.001 on the Weibo dataset. The training takes 100 epochs until convergence. |