Knowledge-Enriched Visual Storytelling

Authors: Chao-Chun Hsu, Zi-Yuan Chen, Chi-Yang Hsu, Chih-Chia Li, Tzu-Yuan Lin, Ting-Hao Huang, Lun-Wei Ku7952-7960

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Per the human ranking evaluation, stories generated by KG-Story are on average ranked better than that of the state-of-the-art systems.
Researcher Affiliation Collaboration 1University of Colorado Boulder, 2Academia Sinica, 3Pennsylvania State University, 4National Chiao Tung University, 5National Taiwan University, 6Most Joint Research Center for AI Technology and All Vista Healthcare
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Our code and output stories are available at https://github.com/zychen423/KE-VIST.
Open Datasets Yes Four datasets were used in this paper: Visual Genome, Open IE, ROCStories Corpora, and VIST Dataset. ... Visual Genome (Krishna et al. 2016) ... ROCStories (Mostafazadeh et al. 2016) ... VIST Dataset (Huang et al. 2016)
Dataset Splits No The paper mentions using specific datasets for training and fine-tuning, such as ROCStories Corpora and VIST Dataset, but does not provide explicit training/test/validation split percentages or sample counts for reproducibility.
Hardware Specification No The paper does not explicitly describe the hardware used to run its experiments, such as specific GPU or CPU models.
Software Dependencies No The paper mentions several software components like Faster R-CNN, Transformer, GRU, Adam optimizer, Spa Cy, and Open SESAME, but it does not provide specific version numbers for these dependencies.
Experiment Setup Yes In all of our experiments, we used the same hyperparameters to train our model. The hidden size of the term prediction and story generation models was set to 512. The head and layer number of the Transformer encoder were 2 and 4. Both models were trained with the Adam optimizer with an initial learning rate of 1e-3, which decayed with the growth of training steps. During decoding, the beam size was set to 3 for both modules.