PPAT: Progressive Graph Pairwise Attention Network for Event Causality Identification

Authors: Zhenyu Liu, Baotian Hu, Zhenran Xu, Min Zhang

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that our model achieves stateof-the-art performance on three benchmark datasets (5.5%, 2.2% and 4.5% F1 gains on Event Story Line, MAVEN-ERE and Causal-Time Bank).
Researcher Affiliation Academia Zhenyu Liu , Baotian Hu , Zhenran Xu and Min Zhang Harbin Institute of Technology, Shenzhen liuzhenyuhit@gmail.com, xuzhenran@stu.hit.edu.cn, {hubaotian, zhangmin2021}@hit.edu.cn
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/HITsz-TMG/PPAT.
Open Datasets Yes We evaluate PPAT on three datasets: Event Story Line (version 0.9) [Caselli and Vossen, 2017], MAVEN-ERE [Wang et al., 2022] and Causal-Time Bank [Mirza, 2014].
Dataset Splits Yes Following previous work [Gao et al., 2019; Chen et al., 2022], we use documents in the last two topics as development set, and employ 5-fold crossvalidation on the remaining documents. ... Since the original test set does not contain gold labels, we divide the development set into a new development set and a new test set, both of which contain 348 documents. ... Following previous work [Liu et al., 2020; Chen et al., 2022], we employ 10-fold cross-validation evaluation for intra-sentence event pairs.
Hardware Specification Yes We run all the experiments on a single NVIDIA A100 GPU.
Software Dependencies No The paper mentions models like BERT-BASE-UNCASED and Longformer-base, and an optimizer Adam W, but does not specify version numbers for any software libraries, programming languages, or environments (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes The models are optimized with Adam W [Loshchilov and Hutter, 2019] with the learning rate of 1e-5 and weight decay of 0.01. We use the linear warmup with 0.1 warmup ratio. We apply a dynamic window to encode the entire document. The window length is 512 for BERT and 2048 for Longformer, and the shift step is 120 for BERT and 500 for Longformer. We train the model for 128 epochs on Event Story Line, 64 on Causal Time Bank and MAVEN-ERE. ... The loss weight λl are set as 2, 6, 0.1, 0.3 for l from 0 to 3.