Effective Modeling of Encoder-Decoder Architecture for Joint Entity and Relation Extraction
Authors: Tapas Nayak, Hwee Tou Ng8528-8535
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on the publicly available New York Times corpus show that our proposed approaches outperform previous work and achieve significantly higher F1 scores. |
| Researcher Affiliation | Academia | Tapas Nayak, Hwee Tou Ng Department of Computer Science National University of Singapore nayakt@u.nus.edu, nght@comp.nus.edu.sg |
| Pseudocode | No | The paper describes the model architecture and algorithms textually and with diagrams (Figure 1), but does not include explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1The code and data of this paper can be found at https://github.com/nusnlp/Ptr Net Decoding4JERE |
| Open Datasets | Yes | We choose the New York Times (NYT) corpus for our experiments. This corpus has multiple versions, and we choose the following two versions... (i) The first version is used by Zeng et al. (2018) (mentioned as NYT in their paper) and has 24 relations. We name this version as NYT24. (ii) The second version is used by Takanobu et al. (2019) (mentioned as NYT10 in their paper) and has 29 relations. We name this version as NYT29. Experiments on the publicly available New York Times corpus |
| Dataset Splits | Yes | We select 10% of the original training data and use it as the validation dataset. The remaining 90% is used for training. |
| Hardware Specification | No | The paper mentions 'GPU memory' and 'GPU configuration' but does not provide specific hardware details such as GPU models, CPU types, or memory amounts used for experiments. |
| Software Dependencies | No | The paper mentions tools and optimizers like 'Word2Vec' and 'Adam' but does not provide specific version numbers for software dependencies or libraries used for implementation (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We set the word embedding dimension dw = 300, relation embedding dimension dr = 300, character embedding dimension dc = 50, and character-based word feature dimension df = 50. To extract the character-based word feature vector, we set the CNN filter width at 3 and the maximum length of a word at 10. The hidden dimension dh of the decoder LSTM cell is set at 300 and the hidden dimension of the forward and the backward LSTM of the encoder is set at 150. The hidden dimension of the forward and backward LSTM of the pointer networks is set at dp = 300. The model is trained with mini-batch size of 32 and the network parameters are optimized using Adam (Kingma and Ba 2015). Dropout layers with a dropout rate fixed at 0.3 are used in our network to avoid overfitting. |