Discourse-Level Event Temporal Ordering with Uncertainty-Guided Graph Completion

Authors: Jian Liu, Jinan Xu, Yufeng Chen, Yujie Zhang

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To verify the effectiveness of our approach, we have conducted extensive experiments on the standard benchmark datasets [Naik et al., 2019]. The experimental results demonstrate that our approach consistently outperforms previous methods and sets up a new state-of-the-art.
Researcher Affiliation Academia Jian Liu , Jinan Xu , Yufeng Chen and Yujie Zhang Beijing Jiaotong University, School of Computer and Information Technology, China jianliu@bjtu.edu.cn, jaxu@bjtu.edu.cn, chenyf@bjtu.edu.cn, yjzhang@bjtu.edu.cn
Pseudocode Yes Algorithm 1 Certain-First Graph Completion Input: A test document D annotated with a set of events ED Output: A complete graph G 1: Transfer D as an empty graph G with nodes being ED 2: while G is not completed do 3: Predict TLINKs for all the missing edges in G 4: Estimate uncertainties of TLINKs via Eq. (4) 5: Select TLINK with minimal uncertainty value as current prediction, and insert the edge into G 6: end while
Open Source Code Yes We have released our code at https://github.com/jianliu-ml/Event Temp to facilitate further exploration.
Open Datasets Yes We use TDDiscourse, the largest discourse-level event temporal ordering benchmark, as the test bed [Naik et al., 2019]. It includes two subsets: 1) TDD-Man, which augments Time Bank-Dense (TBDense) [Cassidy et al., 2014] by manually annotating TLINKs between event pairs that are more than one sentence apart. 2) TDD-Auto, which derives new TLINKs in the document with automatic inference rules. Table 1 and Table 2 compare the sizes and label distributions of TBDense, TDDMan and TDD-Auto.
Dataset Splits Yes Table 1: Number of temporal relations in TBDense, TDD-Man, and TDD-Auto. Dataset Train Dev Test TBDense [Cassidy et al., 2014] 4,032 629 1,427 TDD-Man [Naik et al., 2019] 4,000 650 1,500 TDD-Auto [Naik et al., 2019] 32,609 1,435 4,258
Hardware Specification No No specific hardware (e.g., GPU model, CPU type, memory size) used for running the experiments is mentioned. The paper mentions using DEEP GRAPH LIBRARY but this is a software library.
Software Dependencies No The paper mentions several software components (DEEP GRAPH LIBRARY (DGL), BERT-Base architecture, Glove embeddings, Adam rules, PuLP library) but does not provide specific version numbers for any of them.
Experiment Setup Yes The hyper-parameters of our model are tuned on the development set of TDDiscourse. Finally, for graph mask pre-training, the mask portion is set as 5% (chosen from 1% to 50%, c.f., 6.1). The number of RGCN layers is set at 3 (chosen from [1, 2, 3, 4, 5]), and we use DEEP GRAPH LIBRARY6 (DGL) to implement graph convolution algorithm. To learn the node representations, for BERT encoder, we use BERT-Base architecture; for Bi LSTM encoder, we use Glove embeddings [Pennington et al., 2014] and set the hidden dimension to 256 (chosen from [64, 128, 256, 512]). In uncertainty modeling, we set K to 20 to balance speed and efficiency.