reproducibilityindex.ai

A Generative Approach for Script Event Prediction via Contrastive Fine-Tuning

Authors: Fangqi Zhu, Jun Gao, Changlong Yu, Wei Wang, Chen Xu, Xin Mu, Min Yang, Ruifeng Xu

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on the multi-choice narrative cloze (MCNC) task demonstrate that our approach achieves better results than other state-of-the-art baselines. Our code will be available at https://github.com/zhufq00/mcnc. Experiments In this section, we introduce the datasets, experimental setting and compared baselines. Experimental results show our method achieves state-of-the-art performance on the multi-choice narrative cloze (MCNC) task. We then perform ablation study and model training comparison to understand the effect of the model s key components and their variants on performance.
Researcher Affiliation	Academia	1 Harbin Institute of Technology, Shenzhen 2 Peng Cheng Laboratory 3 Beijing University of Technology 4 Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences
Pseudocode	No	The paper describes its approach in textual format and using flow diagrams, but does not include any structured pseudocode or algorithm blocks. The paper describes its approach in textual format and using flow diagrams, but does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our code will be available at https://github.com/zhufq00/mcnc.
Open Datasets	Yes	For the original MCNC task, Granroth-Wilding and Clark (2016) extracted event chains from the widely-used New York Times portion of the Gigaword corpus (Graff et al. 2003). We used the released codes to reproduce the extraction pipeline, including pos tagging, dependency parsing, and coreference resolution, etc. Compared with the original dataset, the public dataset only covers the event chains that include extracted event relations between them. We also follow the common practice of dataset split for training, validation, and testing in Table 1.
Dataset Splits	Yes	Finally, the dataset contains more than 1.4 million event chains, and we split the training, validation, and test sets as Bai et al. (2021). We denote it as the original dataset, and the statistics are shown in Table 1. Table 1: The statistics of the reproduced original dataset (Granroth-Wilding and Clark 2016) and the public dataset (Li, Ding, and Liu 2018). Train set 1,440,295 140,331 Dev set 10,000 10,000 Test set 10,000 10,000
Hardware Specification	Yes	All the experiments are conducted on Tesla A100 GPU.
Software Dependencies	No	In the event-centric pretraining stage, We use the BART (Lewis et al. 2020) as the backbone for this task and introduce a novel event-level blank infilling strategy as the learning objective to inject event-level knowledge into the pretrained language model. In recent years, pre-trained models like BART (Lewis et al. 2020) have significantly improved over various downstream tasks such as question answering, summarization, and machine translation. Therefore, we adopt BART as the underlying architecture for our model to model the conditional probability distribution P(Y \|X). For these two stages, the model is optimized by Adam (Kingma and Ba 2014). The paper mentions the use of BART and Adam optimizer, but it does not specify software dependencies with version numbers (e.g., Python version, PyTorch/TensorFlow version, CUDA version, etc.) necessary for replication.
Experiment Setup	Yes	The learning rate and weight decay are 1e-5 and 1e-6. Our model uses an early stop strategy to select the best epoch, with patience set to 5. For BARTbase, the batch size is 256 and 64 in the two stages, respectively. For BARTlarge, the batch size is 128 and 32 in the two stages, respectively. For the task-specific contrastive fine-tuning stage, the learning rate is chosen from {1e-5, 2e-5, 3e-5}.