Identify Event Causality with Knowledge and Analogy

Authors: Sifan Wu, Ruihui Zhao, Yefeng Zheng, Jian Pei, Bang Liu

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive evaluations on two benchmark datasets show that our model outperforms other baseline methods by around 18% on the F1-value on average. ... We conduct extensive experiments on the Event Story Line dataset (Caselli and Vossen 2017) and the Causal-Time Bank dataset (Mirza and Tonelli 2014) to evaluate the performance of KADE and compare with baseline methods. The experimental results show that our method outperforms other SOTA baselines by at least 18% in terms of F1-value on both datasets.
Researcher Affiliation Collaboration Sifan Wu1, Ruihui Zhao2, Yefeng Zheng2, Jian Pei3, Bang Liu1* 1 RALI & Mila, University of Montreal 2 Tencent Jarvis Lab 3 Duke University sifan.wu@umontreal.ca, zachary@ruri.waseda.jp, yefengzheng@tencent.com, j.pei@duke.edu, bang.liu@umontreal.ca
Pseudocode Yes Algorithm 1: Two-stage Training of KADE
Open Source Code Yes The code is open sourced to facilitate future research, which can be found here: https://github.com/hihihihiwsf/KADE.
Open Datasets Yes Event Story Line v0.9 comes from (Caselli and Vossen 2017), which involves 258 documents, 22 topics, 4,316 sentences, 5,334 event mentions, 7,805 intra-sentence and 46,521 inter-sentence event mention pairs (1,779 and 3,855 are annotated with a causal relation, respectively). ... Causal-Time Bank (Mirza and Tonelli 2014) contains 184 documents, 6,813 events, and 318 of 7,608 event mention pairs annotated with causal relation.
Dataset Splits Yes Following (Liu, Chen, and Zhao 2020), we use the documents of the last two topics as the development set while the documents of the remaining 20 topics are employed for a 5-fold cross-validation evaluation, using the same data split of (Liu, Chen, and Zhao 2020). ... Following (Liu, Chen, and Zhao 2020), we perform 10-fold cross-validation evaluation for Causal-Time Bank.
Hardware Specification No The paper does not provide specific hardware details used for running experiments.
Software Dependencies No The paper mentions 'PyTorch' and 'BERT-base' but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes We use uncased BERT-base (Devlin et al. 2018) as the encoder like previous works (Zuo et al. 2021b; Liu, Chen, and Zhao 2020), with 12 layers, embedding dimensions of 768, and 12 heads. We employ feed forward network for the classifier. For analogy enhancement, we use k = 3 most similar entities for all our experiments and show the impact of k. For the optimizer, we use Bert Adam (Zhang et al. 2020) and train the model for 40 epochs during the first-stage training, with 1 × 10−6 as learning rate and 1 × 10−4 as weight decay. For the second stage of training, we only fine-tune the classifier for 40 epochs. The batch size is set to 16 for both training stages. We also adopt a negative sampling rate of 0.6 for the first step training, owing to the sparseness of positive examples of ECI datasets.