Multi-modal Graph Fusion for Named Entity Recognition with Targeted Visual Guidance
Authors: Dong Zhang, Suzhong Wei, Shoushan Li, Hanqian Wu, Qiaoming Zhu, Guodong Zhou14347-14355
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimentation on the two benchmark datasets demonstrates the superiority of our MNER model. |
| Researcher Affiliation | Academia | 1 School of Computer Science and Technology, Soochow University, China 2 School of Computer Science and Engineering, Southeast University, China |
| Pseudocode | No | The paper describes the model architecture and processes verbally and mathematically, but does not provide a clearly labeled pseudocode block or algorithm. |
| Open Source Code | Yes | To motivate future research, the code3 will be released in our homepage. 3https://github.com/MANLP-suda/UMGF |
| Open Datasets | Yes | Following (Yu et al. 2020), we first use two public Twitter datasets (i.e., Twitter-2015 and Twitter-2017) for MNER, which are provided by (Zhang et al. 2018) and (Lu et al. 2018), respectively. |
| Dataset Splits | Yes | Table 1 shows the number of entities for each type and the size of data split. |
| Hardware Specification | Yes | For all neural models, we conduct all the experiments on NVIDIA GTX 1080 Ti GPUs with pytorch 1.7. |
| Software Dependencies | Yes | For all neural models, we conduct all the experiments on NVIDIA GTX 1080 Ti GPUs with pytorch 1.7. |
| Experiment Setup | Yes | the maximum length of the sentence input and the batch size are respectively set to 128 and 16. For our approach, the word embeddings X are initialized with the cased BERTbase model pre-trained by Devlin et al. (2019) with dimension of 768, and fine-tuned during training. The visual embeddings are initialized by Res Net152 with dimension of 2048 and finetuned during training. After MLPs, the dimension d of each node is transformed into 512. The head size in multi-head attention is set as 8. The learning rate, the dropout rate, and the tradeoff parameter are respectively set to 1e-4, 0.5, and 0.5, which can achieve the best performance on the development set of both datasets via a small grid search over the combinations of [1e-5, 1e-4], [0.1, 0.5], and [0.1, 0.9]. Based on best-performed development results, the layer number of multi-modal graph fusion is 2. |