reproducibilityindex.ai

Incorporating Structured Commonsense Knowledge in Story Completion

Authors: Jiaao Chen, Jianshu Chen, Zhou Yu6244-6251

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that our model outperforms state-of-the-art approaches on a public dataset, ROCStory Cloze Task (Mostafazadeh et al. 2017), and the performance gain from adding the additional commonsense knowledge is signiﬁcant. [...] We evaluated our model on ROCStories (Mostafazadeh et al. 2017), a publicly available collection of commonsense short stories. [...] We evaluated baselines and our model using accuracy as the metric on the ROCStories dataset, and summarized these results in Table 2. [...] We conducted another two groups of experiments to investigate the contribution of the three different types of information: narrative sequence, sentiment evolution and commonsense knowledge.
Researcher Affiliation	Collaboration	Jiaao Chen, 1 Jianshu Chen,2 Zhou Yu3 1Zhejiang University, 2Tencent AI Lab 3University of California, Davis 3150105589@zju.edu.cn, jianshuchen@tencent.com,joyu@ucdavis.edu
Pseudocode	Yes	Algorithm 1 Knowledge distance computation
Open Source Code	No	The paper mentions "pre-trained parameters released by Open AI 1" with a link "https://github.com/openai/ﬁnetune-transformer-lm" (footnote 1). However, this is a third-party resource used by the authors, not the open-source code for the methodology described in this paper.
Open Datasets	Yes	We evaluated our model on ROCStories (Mostafazadeh et al. 2017), a publicly available collection of commonsense short stories. [...] The published ROCStories dataset 2 is constructed with ROCStories as a training set that includes 98,162 stories that exclude candidate wrong endings, an evaluation set, and a test set, which have the same structure (1 body + 2 candidate endings) and a size of 1,871. [...] 2http://cs.rochester.edu/nlp/rocstories
Dataset Splits	Yes	For learning to select the right ending, we randomly split 80% of stories with two candidates endings in ROCStories evaluation set as our training set (1,479 cases), 20% of stories in ROCStories evaluation set as our validation set (374 cases).
Hardware Specification	No	The paper does not explicitly mention any specific hardware (e.g., GPU model, CPU type, memory) used for running the experiments.
Software Dependencies	No	The paper mentions software components like "NLTK and Standford s Core NLP tools (Manning et al. 2014)", "VADER (Hutto and Gilbert 2014)", and states that "Adam to train all parameters", but does not specify version numbers for these or for Python, which is implied as the implementation language.
Experiment Setup	Yes	Speciﬁcally, we set the dimension of LSTM for sentiment prediction to 64. We use a mini-batch size of 8, and Adam to train all parameters. The learning rate is set to 0.001 initially with a decay rate of 0.5 per epoch.