Beyond Word Attention: Using Segment Attention in Neural Relation Extraction
Authors: Bowen Yu, Zhenyu Zhang, Tingwen Liu, Bin Wang, Sujian Li, Quangang Li
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments are conducted on the TACRED dataset. Results show that our model achieves the state-of-the-art performance on the fully-supervised RE task. We conduct qualitative analyses to understand how our model works with the help of segment attention, including evaluation of the extracted relational expressions. |
| Researcher Affiliation | Collaboration | 1Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China 2School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China 3Xiaomi AI Lab, Xiaomi Inc., Beijing, China 4Key Laboratory of Computational Linguistics, Peking University, MOE, China |
| Pseudocode | No | The paper describes algorithmic steps and equations (e.g., for CRF and forward-backward algorithm) but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | The source code of this paper can be obtained from https://github.com/yubowen-ph/segment. |
| Open Datasets | Yes | We conduct experiments on the recently widely used benchmark TACRED dataset introduced in [Zhang et al., 2017], which is the currently largest supervised dataset for relation extraction. |
| Dataset Splits | Yes | For fair comparisons, we report the test score of the run with the median validation score among 5 randomly initialized runs following the evaluation protocol used in [Zhang et al., 2017]. All the hyper-parameters are tuned on the validation set. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models, or memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions software like 'Stanford Core NLP' and 'Glove embeddings' but does not provide specific version numbers for these or other key software dependencies. |
| Experiment Setup | Yes | Dropout with p = 0.5 used after the input layer and before the classifier layer. λ1 and λ2 are chosen from [0,0.2] via grid search. For LSTM, we set the hidden dimension size to 300 and use 2-layer stacked Bi LSTM. The model is trained using stochastic gradient descent for 30 epochs with the initial learning rate of 1 and the weight decay of 0.5. |