Discourse-Level Event Temporal Ordering with Uncertainty-Guided Graph Completion
Authors: Jian Liu, Jinan Xu, Yufeng Chen, Yujie Zhang
IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To verify the effectiveness of our approach, we have conducted extensive experiments on the standard benchmark datasets [Naik et al., 2019]. The experimental results demonstrate that our approach consistently outperforms previous methods and sets up a new state-of-the-art. |
| Researcher Affiliation | Academia | Jian Liu , Jinan Xu , Yufeng Chen and Yujie Zhang Beijing Jiaotong University, School of Computer and Information Technology, China jianliu@bjtu.edu.cn, jaxu@bjtu.edu.cn, chenyf@bjtu.edu.cn, yjzhang@bjtu.edu.cn |
| Pseudocode | Yes | Algorithm 1 Certain-First Graph Completion Input: A test document D annotated with a set of events ED Output: A complete graph G 1: Transfer D as an empty graph G with nodes being ED 2: while G is not completed do 3: Predict TLINKs for all the missing edges in G 4: Estimate uncertainties of TLINKs via Eq. (4) 5: Select TLINK with minimal uncertainty value as current prediction, and insert the edge into G 6: end while |
| Open Source Code | Yes | We have released our code at https://github.com/jianliu-ml/Event Temp to facilitate further exploration. |
| Open Datasets | Yes | We use TDDiscourse, the largest discourse-level event temporal ordering benchmark, as the test bed [Naik et al., 2019]. It includes two subsets: 1) TDD-Man, which augments Time Bank-Dense (TBDense) [Cassidy et al., 2014] by manually annotating TLINKs between event pairs that are more than one sentence apart. 2) TDD-Auto, which derives new TLINKs in the document with automatic inference rules. Table 1 and Table 2 compare the sizes and label distributions of TBDense, TDDMan and TDD-Auto. |
| Dataset Splits | Yes | Table 1: Number of temporal relations in TBDense, TDD-Man, and TDD-Auto. Dataset Train Dev Test TBDense [Cassidy et al., 2014] 4,032 629 1,427 TDD-Man [Naik et al., 2019] 4,000 650 1,500 TDD-Auto [Naik et al., 2019] 32,609 1,435 4,258 |
| Hardware Specification | No | No specific hardware (e.g., GPU model, CPU type, memory size) used for running the experiments is mentioned. The paper mentions using DEEP GRAPH LIBRARY but this is a software library. |
| Software Dependencies | No | The paper mentions several software components (DEEP GRAPH LIBRARY (DGL), BERT-Base architecture, Glove embeddings, Adam rules, PuLP library) but does not provide specific version numbers for any of them. |
| Experiment Setup | Yes | The hyper-parameters of our model are tuned on the development set of TDDiscourse. Finally, for graph mask pre-training, the mask portion is set as 5% (chosen from 1% to 50%, c.f., 6.1). The number of RGCN layers is set at 3 (chosen from [1, 2, 3, 4, 5]), and we use DEEP GRAPH LIBRARY6 (DGL) to implement graph convolution algorithm. To learn the node representations, for BERT encoder, we use BERT-Base architecture; for Bi LSTM encoder, we use Glove embeddings [Pennington et al., 2014] and set the hidden dimension to 256 (chosen from [64, 128, 256, 512]). In uncertainty modeling, we set K to 20 to balance speed and efficiency. |