Multi-modal Graph Fusion for Named Entity Recognition with Targeted Visual Guidance

Authors: Dong Zhang, Suzhong Wei, Shoushan Li, Hanqian Wu, Qiaoming Zhu, Guodong Zhou14347-14355

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimentation on the two benchmark datasets demonstrates the superiority of our MNER model.
Researcher Affiliation Academia 1 School of Computer Science and Technology, Soochow University, China 2 School of Computer Science and Engineering, Southeast University, China
Pseudocode No The paper describes the model architecture and processes verbally and mathematically, but does not provide a clearly labeled pseudocode block or algorithm.
Open Source Code Yes To motivate future research, the code3 will be released in our homepage. 3https://github.com/MANLP-suda/UMGF
Open Datasets Yes Following (Yu et al. 2020), we first use two public Twitter datasets (i.e., Twitter-2015 and Twitter-2017) for MNER, which are provided by (Zhang et al. 2018) and (Lu et al. 2018), respectively.
Dataset Splits Yes Table 1 shows the number of entities for each type and the size of data split.
Hardware Specification Yes For all neural models, we conduct all the experiments on NVIDIA GTX 1080 Ti GPUs with pytorch 1.7.
Software Dependencies Yes For all neural models, we conduct all the experiments on NVIDIA GTX 1080 Ti GPUs with pytorch 1.7.
Experiment Setup Yes the maximum length of the sentence input and the batch size are respectively set to 128 and 16. For our approach, the word embeddings X are initialized with the cased BERTbase model pre-trained by Devlin et al. (2019) with dimension of 768, and fine-tuned during training. The visual embeddings are initialized by Res Net152 with dimension of 2048 and finetuned during training. After MLPs, the dimension d of each node is transformed into 512. The head size in multi-head attention is set as 8. The learning rate, the dropout rate, and the tradeoff parameter are respectively set to 1e-4, 0.5, and 0.5, which can achieve the best performance on the development set of both datasets via a small grid search over the combinations of [1e-5, 1e-4], [0.1, 0.5], and [0.1, 0.9]. Based on best-performed development results, the layer number of multi-modal graph fusion is 2.