Hide-and-Tell: Learning to Bridge Photo Streams for Visual Storytelling

Authors: Yunjae Jung, Dahun Kim, Sanghyun Woo, Kyungsu Kim, Sungjin Kim, In So Kweon11213-11220

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In experiments, we show that our scheme of hide-and-tell, and the network design are indeed effective at storytelling, and that our model outperforms previous state-of-the-art methods in automatic metrics.
Researcher Affiliation Collaboration Yunjae Jung,1 Dahun Kim,1 Sanghyun Woo,1 Kyungsu Kim,2 Sungjin Kim,2 In So Kweon1 1Korea Advanced Institute of Science and Technology (KAIST), Korea 2Samsung Electronics Co., Ltd (Samsung Research), Korea
Pseudocode No The paper includes architectural diagrams and mathematical equations but does not present any formal pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets Yes Our experiments are conducted on the VIST dataset which provides 210,819 unique photos from 10,117 Flickr albums for visual storytelling tasks. For the fair comparison, we follow the conventional experimental settings used in existing methods (Yu, Bansal, and Berg 2017; Wang et al. 2018c).
Dataset Splits Yes Also, the same number of training, validation, and test sets are used: 4,098, 4,988, and 5,050.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. It only mentions general components like 'pre-trained CNN layer' and 'Adam optimizer'.
Software Dependencies No The paper mentions several components like ResNet-152, Adam optimizer, ReLU, and SELU, but it does not specify any software libraries or frameworks with their version numbers (e.g., 'PyTorch 1.9' or 'Python 3.8') that would be required for reproduction.
Experiment Setup Yes We empirically choose hyper parameters for curriculum learning; α = 50, β = 80. The learning rate starts with 4e 4, and it decays by half when the training difficulty is changed (i.e. epoch = α or β). Adam optimizer is used. For non-linearity in the network, Re LU (Nair and Hinton 2010) is used for pre-trained CNN layers and SELU (Klambauer et al. 2017) is employed for the imagining step and the telling step. In decoding stage, beam search is utilized with beam size = 3.