reproducibilityindex.ai

Densely Supervised Hierarchical Policy-Value Network for Image Paragraph Generation

Authors: Siying Wu, Zheng-Jun Zha, Zilei Wang, Houqiang Li, Feng Wu

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on the Stanford image-paragraph benchmark have demonstrated the effectiveness of the proposed DHPV approach with performance improvements over multiple state-of-the-art methods.
Researcher Affiliation	Academia	Siying Wu , Zheng-Jun Zha , Zilei Wang , Houqiang Li and Feng Wu National Engineering Laboratory for Brain-inspired Intelligence Technology and Application, University of Science and Technology of China wsy315@mail.ustc.edu.cn, {zhazj,zlwang,lihq,fengwu}@ustc.edu.cn
Pseudocode	No	The paper describes the model architecture and training process using mathematical equations and figures, but no explicit pseudocode or algorithm blocks are provided.
Open Source Code	No	The paper does not provide any statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets	Yes	We conduct experiments on the Stanford image-paragraph dataset released in [Krause et al., 2017]
Dataset Splits	Yes	The dataset was split into training, testing and validation subsets with 14,575 images, 2,489 images and 2,487 images, respectively [Krause et al., 2017].
Hardware Specification	No	The paper does not provide any specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper does not provide specific software dependency details with version numbers (e.g., Python, PyTorch, TensorFlow versions or other libraries).
Experiment Setup	Yes	We set the dimension of hidden layers as 512 for all the LSTM cells in our network. We set the hyper-parameters λs and λw as 5.0 and 1.0, respectively. Following the common setting [Krause et al., 2017], we generate 6 sentences at most with a maximum of 30 words for each to describe a given image. We use SGD solver with an initial learning rate of 1e 4 to train the network for the ﬁrst 3 epoch. Then the learning rate stepped decay every 3 training epochs. In addition, we sample K = 20 rollout sequences in word-level reward computation. We ﬁrst pre-train the hierarchical policy network for 50 epochs with the cross-entropy loss in Eq. (17).