reproducibilityindex.ai

A Hierarchical Network for Multimodal Document-Level Relation Extraction

Authors: Lingxing Kong, Jiuliang Wang, Zheng Ma, Qifeng Zhou, Jianbing Zhang, Liang He, Jiajun Chen

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on our proposed dataset show that 1) incorporating video information greatly improves model performance; 2) our hierarchical framework has state-of-the-art results compared with both unimodal and multimodal baselines; 3) through collaborating with video information, our model better solves the long-dependency and mention-selection problems. and We split the constructed dataset into a training set with 2300 samples, a development set with 343 samples, and a testing set with 400 samples. The hyper-parameters for all models are tuned on the development set.
Researcher Affiliation	Collaboration	1 National Key Laboratory for Novel Software Technology, Nanjing University, China 2 Institute for AI Industry Research (AIR), Tsinghua University 3 School of Artificial Intelligence, Nanjing University, China and *Internship at AIR, Tsinghua University
Pseudocode	No	The paper describes its methods and provides architectural diagrams (e.g., Figure 2) but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps in a code-like format.
Open Source Code	Yes	We make our resources available (https://github.com/acddca/MDoc RE).
Open Datasets	Yes	To support this novel task, we construct a human-annotated dataset with VOA news scripts and videos. Our approach to addressing this task is based on a hierarchical network that adeptly captures and fuses multimodal features at two distinct levels. and We make our resources available (https://github.com/acddca/MDoc RE).
Dataset Splits	Yes	We split the constructed dataset into a training set with 2300 samples, a development set with 343 samples, and a testing set with 400 samples.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments. It only mentions general experimental setup details and software.
Software Dependencies	No	The paper mentions using pre-trained language models like BERT and architectural components such as CNN and Transformers, but it does not specify exact version numbers for software dependencies like programming languages (e.g., Python 3.x), libraries (e.g., PyTorch 1.x), or specific frameworks.
Experiment Setup	Yes	We set the number of textual-guided transformer layers LN1 in the Global Encoder and LN2 in the Local Encoder to 1 and 2, respectively. We set the number of heads N to 12. The maximum sequence length for textual and visual inputs is set to 512 and 128, respectively. During training, we use a batch size of 4, a learning rate of 1e-5, and a dropout rate of 0.2.