PPAT: Progressive Graph Pairwise Attention Network for Event Causality Identification
Authors: Zhenyu Liu, Baotian Hu, Zhenran Xu, Min Zhang
IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that our model achieves stateof-the-art performance on three benchmark datasets (5.5%, 2.2% and 4.5% F1 gains on Event Story Line, MAVEN-ERE and Causal-Time Bank). |
| Researcher Affiliation | Academia | Zhenyu Liu , Baotian Hu , Zhenran Xu and Min Zhang Harbin Institute of Technology, Shenzhen liuzhenyuhit@gmail.com, xuzhenran@stu.hit.edu.cn, {hubaotian, zhangmin2021}@hit.edu.cn |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/HITsz-TMG/PPAT. |
| Open Datasets | Yes | We evaluate PPAT on three datasets: Event Story Line (version 0.9) [Caselli and Vossen, 2017], MAVEN-ERE [Wang et al., 2022] and Causal-Time Bank [Mirza, 2014]. |
| Dataset Splits | Yes | Following previous work [Gao et al., 2019; Chen et al., 2022], we use documents in the last two topics as development set, and employ 5-fold crossvalidation on the remaining documents. ... Since the original test set does not contain gold labels, we divide the development set into a new development set and a new test set, both of which contain 348 documents. ... Following previous work [Liu et al., 2020; Chen et al., 2022], we employ 10-fold cross-validation evaluation for intra-sentence event pairs. |
| Hardware Specification | Yes | We run all the experiments on a single NVIDIA A100 GPU. |
| Software Dependencies | No | The paper mentions models like BERT-BASE-UNCASED and Longformer-base, and an optimizer Adam W, but does not specify version numbers for any software libraries, programming languages, or environments (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | The models are optimized with Adam W [Loshchilov and Hutter, 2019] with the learning rate of 1e-5 and weight decay of 0.01. We use the linear warmup with 0.1 warmup ratio. We apply a dynamic window to encode the entire document. The window length is 512 for BERT and 2048 for Longformer, and the shift step is 120 for BERT and 500 for Longformer. We train the model for 128 epochs on Event Story Line, 64 on Causal Time Bank and MAVEN-ERE. ... The loss weight λl are set as 2, 6, 0.1, 0.3 for l from 0 to 3. |