Attention as Relation: Learning Supervised Multi-head Self-Attention for Relation Extraction

Authors: Jie Liu, Shaowei Chen, Bingquan Wang, Jiaxin Zhang, Na Li, Tong Xu

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To verify the effectiveness of our model, we conduct comprehensive experiments on two benchmark datasets. The experimental results demonstrate that our model achieves state-of-the-art performances.
Researcher Affiliation Academia 1College of Artificial Intelligence, Nankai University, Tianjin, China 2College of Computer Science, Nankai University, Tianjin, China 3University of Science and Technology of China, Hefei, China
Pseudocode No The paper presents mathematical formulations and a model framework diagram, but it does not include a clearly labeled pseudocode or algorithm block.
Open Source Code Yes 1https://github.com/NKU-IIPLab/SMHSA
Open Datasets Yes To verify the effectiveness of our model, we conduct extensive experiments on two benchmark datasets, including New York Times (NYT) [Riedel et al., 2010] and Web NLG [Gardent et al., 2017].
Dataset Splits Yes To construct the development set, we randomly select 10% samples from the training set. The statistics of the above datasets are shown in Table 1.
Hardware Specification No The paper does not provide specific details about the hardware used for experiments, such as GPU models, CPU types, or memory.
Software Dependencies No The paper mentions using "pre-trained Glove 840B vectors" and the "RMSprop optimizer," but it does not specify software dependencies with version numbers (e.g., Python version, specific deep learning framework version like PyTorch or TensorFlow).
Experiment Setup Yes The dimensions of hidden states for character LSTM, encoding layer, entity extraction module, and relation extraction module are set to 100, 600, 250, 250, respectively. ... The learning rate, learning rate decay, and batch size are set to 0.001, 0.95, and 10, respectively. To ensure the balance between entity extraction and relation detection, we adopt an iterative two-step training manner... To avoid overfitting, we apply dropout at a rate of 0.3.