UMIE: Unified Multimodal Information Extraction with Instruction Tuning

Authors: Lin Sun, Kai Zhang, Qingyuan Li, Renze Lou

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that our single UMIE outperforms various state-of-the-art (So TA) methods across six MIE datasets on three tasks. Furthermore, in-depth analysis demonstrates UMIE s strong generalization in the zero-shot setting, robustness to instruction variants, and interpretability.
Researcher Affiliation Academia 1Department of Computer Science, Hangzhou City University, China 2Department of Computer Science and Engineering, Ohio State University, USA 3College of Computer Science and Technology, Zhejiang University, China 4Department of Computer Science and Engineering, Pennsylvania State University, USA sunl@hzcu.edu.cn, liqingyuan@zju.edu.cn
Pseudocode No The paper describes its model architecture and components in text and with mathematical equations, but it does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Our code, data, and model are available at https://github.com/ZUCC-AI/UMIE.
Open Datasets Yes We train and evaluate UMIE on several datasets commonly used in MNER, MRE, and MEE tasks: 1) For MNER, we consider Twitter-15 (Zhang et al. 2018), SNAP (Lu et al. 2018), and Twitter-17 (Yu et al. 2020) (a refined version of SNAP), all curated from the social media platform; 2) For MRE, we adopt the MNRE dataset (Zheng et al. 2021b) constructed from the social media domain via crowdsourcing; 3) For MEE, following previous work (Tong et al. 2020), we employ datasets such as ACE2005 (Walker et al. 2006), SWi G (Pratt et al. 2020) for training, and the M2E2dataset for evaluation. ... We will release all MIE datasets with standard format and models trained on them, as a benchmark and starting point for future studies in this area of unified multimodal information extraction.
Dataset Splits Yes Table 3: The statistics of six MIE datasets. Task Dataset Train Dev Test Twitter-15 4,000 1,000 3,257 Twitter-17 2,848 723 723 SNAP 3,971 1,432 1,459 MNRE-V1 7,824 975 1,282 MNRE-V2 12,247 923 832 MEE M2E2 309
Hardware Specification Yes All experiments are conducted on 8 NVIDIA A100 GPUs, each possessing a memory capacity of 40GB.
Software Dependencies No The paper mentions using FLAN-T5 and optimizers like Adam W, but does not provide specific version numbers for these software components or other libraries/frameworks.
Experiment Setup Yes We train our model by employing label smoothing and Adam W, with a learning rate of 5e-5 for FLAN-T5-large and 1e-4 for FLAN-T5-base. The number of training epochs is set to 40. ... Due to GPU memory limitations, we use different batch sizes: 8 for FLAN-T5-large and 16 for FLAN-T5-base. During the training process, we restrict the output of the text input to a maximum length of 256 and the generated length to 128.