Densely Supervised Hierarchical Policy-Value Network for Image Paragraph Generation
Authors: Siying Wu, Zheng-Jun Zha, Zilei Wang, Houqiang Li, Feng Wu
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on the Stanford image-paragraph benchmark have demonstrated the effectiveness of the proposed DHPV approach with performance improvements over multiple state-of-the-art methods. |
| Researcher Affiliation | Academia | Siying Wu , Zheng-Jun Zha , Zilei Wang , Houqiang Li and Feng Wu National Engineering Laboratory for Brain-inspired Intelligence Technology and Application, University of Science and Technology of China wsy315@mail.ustc.edu.cn, {zhazj,zlwang,lihq,fengwu}@ustc.edu.cn |
| Pseudocode | No | The paper describes the model architecture and training process using mathematical equations and figures, but no explicit pseudocode or algorithm blocks are provided. |
| Open Source Code | No | The paper does not provide any statement about releasing source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | We conduct experiments on the Stanford image-paragraph dataset released in [Krause et al., 2017] |
| Dataset Splits | Yes | The dataset was split into training, testing and validation subsets with 14,575 images, 2,489 images and 2,487 images, respectively [Krause et al., 2017]. |
| Hardware Specification | No | The paper does not provide any specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependency details with version numbers (e.g., Python, PyTorch, TensorFlow versions or other libraries). |
| Experiment Setup | Yes | We set the dimension of hidden layers as 512 for all the LSTM cells in our network. We set the hyper-parameters λs and λw as 5.0 and 1.0, respectively. Following the common setting [Krause et al., 2017], we generate 6 sentences at most with a maximum of 30 words for each to describe a given image. We use SGD solver with an initial learning rate of 1e 4 to train the network for the first 3 epoch. Then the learning rate stepped decay every 3 training epochs. In addition, we sample K = 20 rollout sequences in word-level reward computation. We first pre-train the hierarchical policy network for 50 epochs with the cross-entropy loss in Eq. (17). |