Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Story Ending Prediction by Transferable BERT

Authors: Zhongyang Li, Xiao Ding, Ting Liu

IJCAI 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this study, we take story ending prediction as the target task to conduct experiments. The ﬁnal result, an accuracy of 91.8%, dramatically outperforms previous state-of-the-art baseline methods. Several comparative experiments give some helpful suggestions on how to select transfer tasks to improve BERT.
Researcher Affiliation	Academia	Zhongyang Li , Xiao Ding and Ting Liu Research Center for Social Computing and Information Retrieval, Harbin Institute of Technology EMAIL
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks. It describes processes in text and diagrams.
Open Source Code	Yes	All of our experiments were based on https://github.com/huggingface/pytorch-pretrained-BERT. We also released our code at https://github.com/eecrazy/TransBERT-ijcai2019.
Open Datasets	Yes	SCT v1.0 [Mostafazadeh et al., 2016] is the widely used version. ... SCT v1.5 [Sharma et al., 2018] is a recently released revised version...
Dataset Splits	Yes	Here we only use the development and test datasets, and split development set into 1,771 instances for training and 100 instances for development purposes. ... The detailed dataset statistics are shown in Table 1.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models, or memory specifications, used for running the experiments.
Software Dependencies	No	The paper states: 'All of our experiments were based on https://github.com/huggingface/pytorch-pretrained-BERT.' While it mentions a library, it does not provide a specific version number for it or any other software component.
Experiment Setup	Yes	We train each transfer task and the SCT with 3 epochs monitoring on the development set, using a cross-entropy objective2. Other hyper parameters follow [Devlin et al., 2018].