Unsupervised Storyline Extraction from News Articles

Authors: Deyu Zhou, Haiyang Xu, Xin-Yu Dai, Yulan He

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The proposed model has been evaluated on three news corpora and the experimental results show that it outperforms several baseline approaches.
Researcher Affiliation Academia School of Computer Science and Engineering, Southeast University, China State Key Laboratory for Novel Software Technology, Nanjing University, China School of Engineering and Applied Science, Aston University, UK
Pseudocode No The paper describes the generative process and inference steps in text but does not include structured pseudocode or an algorithm block.
Open Source Code No The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes We crawled and parsed the GDELT Event Database (http://data.gdeltproject.org/events/index.html) containing news articles published in the month of May in 2014.
Dataset Splits No The paper describes the datasets used (Dataset I, II, and III) but does not provide specific training, validation, and test splits with percentages, counts, or a clear methodology for reproducible data partitioning.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, or cloud instances) used for running the experiments.
Software Dependencies No The paper mentions using the 'Stanford Named Entity Recognizer' and 'light LDA' but does not specify version numbers for these or any other ancillary software dependencies, which would be required for reproducibility.
Experiment Setup Yes The hyperparameters of the model are set = 1, λ = 0.5, t bg = 0.1, t bg = 0.01, t z = 0.7(s 2 1..St, t 2 1..T) in our experiment. For SDM, the storyline number is set to 100 on Dataset II and 30 on Dataset III. The topic number is set to 100 on Dataset II and 20 on Dataset III. The number of historical epochs M, which is taken into account for setting the Dirichlet priors for the storyline-keyword, storyline-location, storyline-person, storyline-organization distributions, is set to 7, the same as in our proposed approach.