Densely Supervised Hierarchical Policy-Value Network for Image Paragraph Generation

Authors: Siying Wu, Zheng-Jun Zha, Zilei Wang, Houqiang Li, Feng Wu

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on the Stanford image-paragraph benchmark have demonstrated the effectiveness of the proposed DHPV approach with performance improvements over multiple state-of-the-art methods.
Researcher Affiliation Academia Siying Wu , Zheng-Jun Zha , Zilei Wang , Houqiang Li and Feng Wu National Engineering Laboratory for Brain-inspired Intelligence Technology and Application, University of Science and Technology of China wsy315@mail.ustc.edu.cn, {zhazj,zlwang,lihq,fengwu}@ustc.edu.cn
Pseudocode No The paper describes the model architecture and training process using mathematical equations and figures, but no explicit pseudocode or algorithm blocks are provided.
Open Source Code No The paper does not provide any statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets Yes We conduct experiments on the Stanford image-paragraph dataset released in [Krause et al., 2017]
Dataset Splits Yes The dataset was split into training, testing and validation subsets with 14,575 images, 2,489 images and 2,487 images, respectively [Krause et al., 2017].
Hardware Specification No The paper does not provide any specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper does not provide specific software dependency details with version numbers (e.g., Python, PyTorch, TensorFlow versions or other libraries).
Experiment Setup Yes We set the dimension of hidden layers as 512 for all the LSTM cells in our network. We set the hyper-parameters λs and λw as 5.0 and 1.0, respectively. Following the common setting [Krause et al., 2017], we generate 6 sentences at most with a maximum of 30 words for each to describe a given image. We use SGD solver with an initial learning rate of 1e 4 to train the network for the first 3 epoch. Then the learning rate stepped decay every 3 training epochs. In addition, we sample K = 20 rollout sequences in word-level reward computation. We first pre-train the hierarchical policy network for 50 epochs with the cross-entropy loss in Eq. (17).