Knowledge-Enriched Visual Storytelling
Authors: Chao-Chun Hsu, Zi-Yuan Chen, Chi-Yang Hsu, Chih-Chia Li, Tzu-Yuan Lin, Ting-Hao Huang, Lun-Wei Ku7952-7960
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Per the human ranking evaluation, stories generated by KG-Story are on average ranked better than that of the state-of-the-art systems. |
| Researcher Affiliation | Collaboration | 1University of Colorado Boulder, 2Academia Sinica, 3Pennsylvania State University, 4National Chiao Tung University, 5National Taiwan University, 6Most Joint Research Center for AI Technology and All Vista Healthcare |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code and output stories are available at https://github.com/zychen423/KE-VIST. |
| Open Datasets | Yes | Four datasets were used in this paper: Visual Genome, Open IE, ROCStories Corpora, and VIST Dataset. ... Visual Genome (Krishna et al. 2016) ... ROCStories (Mostafazadeh et al. 2016) ... VIST Dataset (Huang et al. 2016) |
| Dataset Splits | No | The paper mentions using specific datasets for training and fine-tuning, such as ROCStories Corpora and VIST Dataset, but does not provide explicit training/test/validation split percentages or sample counts for reproducibility. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used to run its experiments, such as specific GPU or CPU models. |
| Software Dependencies | No | The paper mentions several software components like Faster R-CNN, Transformer, GRU, Adam optimizer, Spa Cy, and Open SESAME, but it does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | In all of our experiments, we used the same hyperparameters to train our model. The hidden size of the term prediction and story generation models was set to 512. The head and layer number of the Transformer encoder were 2 and 4. Both models were trained with the Adam optimizer with an initial learning rate of 1e-3, which decayed with the growth of training steps. During decoding, the beam size was set to 3 for both modules. |