reproducibilityindex.ai

Uncertainty-Aware Image Captioning

Authors: Zhengcong Fei, Mingyuan Fan, Li Zhu, Junshi Huang, Xiaoming Wei, Xiaolin Wei

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on the MS COCO benchmark reveal that our approach outperforms the strong baseline and related methods on both captioning quality as well as decoding speed.
Researcher Affiliation	Industry	Meituan Beijing, China {feizhengcong, fanmingyuan, zhuli09, huangjunshi}@meituan.com {weixiaoming, weixiaolin02}@meituan.com
Pseudocode	Yes	Algorithm 1: DP-based Training Data Pair Construction
Open Source Code	Yes	In particular, to improve reproducibility and foster new research in the field, we publicly release the source code and trained models of all experiments.
Open Datasets	Yes	Dataset. We evaluate our proposed method on MS COCO (Chen et al. 2015), which is a standard benchmark for image captioning tasks. To be consistent with previous works, (Huang et al. 2019; Cornia et al. 2020), we adopted the Karpathy split (Karpathy and Fei-Fei 2015) that contains 113,287 training images equipped with five humanannotated sentences each and 5,000 images for validation and test splits, respectively.
Dataset Splits	Yes	To be consistent with previous works, (Huang et et al. 2019; Cornia et al. 2020), we adopted the Karpathy split (Karpathy and Fei-Fei 2015) that contains 113,287 training images equipped with five humanannotated sentences each and 5,000 images for validation and test splits, respectively.
Hardware Specification	Yes	The decoding time for speedup estimation is measured on a single image without minibatching and feature extraction, averaged over the whole test split with a 32G V100 GPU.
Software Dependencies	No	The paper mentions using the Adam optimizer, but it does not specify version numbers for any software dependencies like programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	Specifically, the number of stacked blocks is 3, the hidden size is 512, and feed-forward filter size is 2048 with a 0.2 dropout rate. During training, we train the UAIC model for 15 epochs with an initial learning rate of 3e-5 and decay it by 0.9 every five epochs with the combined loss presented in Equation 9 (He et al. 2019). Adam (Kingma and Ba 2014) optimizer with 3000 steps warm-up trick is employed.