reproducibilityindex.ai

Object Relation Attention for Image Paragraph Captioning

Authors: Li-Chuan Yang, Chih-Yuan Yang, Jane Yung-jen Hsu3136-3144

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that the proposed network extracts effective object features for image paragraph captioning and achieves promising performance against existing methods. We evaluate the proposed method on the Stanford paragraph dataset (Krause et al. 2017), which contains 19551 image/paragraph pairs, split into training/validation/test sets containing 14575/2487/2489 pairs, respectively. We use 6 metrics CIDEr (Vedantam, Zitnick, and Parikh 2015), METEOR (Banerjee and Lavie 2005), BLEU-1, BLEU-2, BLEU-3, and BLEU-4 (Papineni et al. 2002) as the literature (Krause et al. 2017; Liang et al. 2017; Chatterjee and Schwing 2018; Melas-Kyriazi, Rush, and Han 2018).
Researcher Affiliation	Academia	Li-Chuan Yang,1 Chih-Yuan Yang,1,2 and Jane Yung-jen Hsu1,2 1Computer Science and Information Engineering, National Taiwan University 2NTU Io X Center, National Taiwan University {r07922100,yangchihyuan,yjhsu}@ntu.edu.tw
Pseudocode	No	No explicit pseudocode or algorithm blocks found. The paper describes the architecture and steps in text and flowcharts but not as structured pseudocode.
Open Source Code	No	No statement about releasing their own source code or link to a repository for the described methodology. The acknowledgement section mentions appreciation for 'open-source implementations' by others, not their own.
Open Datasets	Yes	We evaluate the proposed method on the Stanford paragraph dataset (Krause et al. 2017), which contains 19551 image/paragraph pairs, split into training/validation/test sets containing 14575/2487/2489 pairs, respectively.
Dataset Splits	Yes	We evaluate the proposed method on the Stanford paragraph dataset (Krause et al. 2017), which contains 19551 image/paragraph pairs, split into training/validation/test sets containing 14575/2487/2489 pairs, respectively.
Hardware Specification	Yes	We train our model on a machine equipped with a 3.7GHz 12-core CPU and an NVidia GPU GTX 1080Ti.
Software Dependencies	No	The paper mentions using a "publicly available Faster R-CNN implementation" and the "Adam optimizer" but does not provide specific version numbers for any software, libraries, or dependencies.
Experiment Setup	Yes	To train our models, we use the Adam optimizer with a learning rate initialized as 5 10 4 and decaying 20% every two epochs. We manually set the attention hyperparameter c as 2 because we ﬁnd the proposed method converges well and performs stably when the value is between 1 to 3. We set the training batch size as 10. The conﬁguration of overlapping objects and asymmetric features consumes 2.3 GB GPU memory and takes 16 hours to run 80 epochs, including the ﬁrst 30 cross-entropy epochs and the following 50 SCST epochs.