reproducibilityindex.ai

Discourse-Level Event Temporal Ordering with Uncertainty-Guided Graph Completion

Authors: Jian Liu, Jinan Xu, Yufeng Chen, Yujie Zhang

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To verify the effectiveness of our approach, we have conducted extensive experiments on the standard benchmark datasets [Naik et al., 2019]. The experimental results demonstrate that our approach consistently outperforms previous methods and sets up a new state-of-the-art.
Researcher Affiliation	Academia	Jian Liu , Jinan Xu , Yufeng Chen and Yujie Zhang Beijing Jiaotong University, School of Computer and Information Technology, China jianliu@bjtu.edu.cn, jaxu@bjtu.edu.cn, chenyf@bjtu.edu.cn, yjzhang@bjtu.edu.cn
Pseudocode	Yes	Algorithm 1 Certain-First Graph Completion Input: A test document D annotated with a set of events ED Output: A complete graph G 1: Transfer D as an empty graph G with nodes being ED 2: while G is not completed do 3: Predict TLINKs for all the missing edges in G 4: Estimate uncertainties of TLINKs via Eq. (4) 5: Select TLINK with minimal uncertainty value as current prediction, and insert the edge into G 6: end while
Open Source Code	Yes	We have released our code at https://github.com/jianliu-ml/Event Temp to facilitate further exploration.
Open Datasets	Yes	We use TDDiscourse, the largest discourse-level event temporal ordering benchmark, as the test bed [Naik et al., 2019]. It includes two subsets: 1) TDD-Man, which augments Time Bank-Dense (TBDense) [Cassidy et al., 2014] by manually annotating TLINKs between event pairs that are more than one sentence apart. 2) TDD-Auto, which derives new TLINKs in the document with automatic inference rules. Table 1 and Table 2 compare the sizes and label distributions of TBDense, TDDMan and TDD-Auto.
Dataset Splits	Yes	Table 1: Number of temporal relations in TBDense, TDD-Man, and TDD-Auto. Dataset Train Dev Test TBDense [Cassidy et al., 2014] 4,032 629 1,427 TDD-Man [Naik et al., 2019] 4,000 650 1,500 TDD-Auto [Naik et al., 2019] 32,609 1,435 4,258
Hardware Specification	No	No specific hardware (e.g., GPU model, CPU type, memory size) used for running the experiments is mentioned. The paper mentions using DEEP GRAPH LIBRARY but this is a software library.
Software Dependencies	No	The paper mentions several software components (DEEP GRAPH LIBRARY (DGL), BERT-Base architecture, Glove embeddings, Adam rules, PuLP library) but does not provide specific version numbers for any of them.
Experiment Setup	Yes	The hyper-parameters of our model are tuned on the development set of TDDiscourse. Finally, for graph mask pre-training, the mask portion is set as 5% (chosen from 1% to 50%, c.f., 6.1). The number of RGCN layers is set at 3 (chosen from [1, 2, 3, 4, 5]), and we use DEEP GRAPH LIBRARY6 (DGL) to implement graph convolution algorithm. To learn the node representations, for BERT encoder, we use BERT-Base architecture; for Bi LSTM encoder, we use Glove embeddings [Pennington et al., 2014] and set the hidden dimension to 256 (chosen from [64, 128, 256, 512]). In uncertainty modeling, we set K to 20 to balance speed and efﬁciency.