Uncertainty-Aware Image Captioning
Authors: Zhengcong Fei, Mingyuan Fan, Li Zhu, Junshi Huang, Xiaoming Wei, Xiaolin Wei
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on the MS COCO benchmark reveal that our approach outperforms the strong baseline and related methods on both captioning quality as well as decoding speed. |
| Researcher Affiliation | Industry | Meituan Beijing, China {feizhengcong, fanmingyuan, zhuli09, huangjunshi}@meituan.com {weixiaoming, weixiaolin02}@meituan.com |
| Pseudocode | Yes | Algorithm 1: DP-based Training Data Pair Construction |
| Open Source Code | Yes | In particular, to improve reproducibility and foster new research in the field, we publicly release the source code and trained models of all experiments. |
| Open Datasets | Yes | Dataset. We evaluate our proposed method on MS COCO (Chen et al. 2015), which is a standard benchmark for image captioning tasks. To be consistent with previous works, (Huang et al. 2019; Cornia et al. 2020), we adopted the Karpathy split (Karpathy and Fei-Fei 2015) that contains 113,287 training images equipped with five humanannotated sentences each and 5,000 images for validation and test splits, respectively. |
| Dataset Splits | Yes | To be consistent with previous works, (Huang et et al. 2019; Cornia et al. 2020), we adopted the Karpathy split (Karpathy and Fei-Fei 2015) that contains 113,287 training images equipped with five humanannotated sentences each and 5,000 images for validation and test splits, respectively. |
| Hardware Specification | Yes | The decoding time for speedup estimation is measured on a single image without minibatching and feature extraction, averaged over the whole test split with a 32G V100 GPU. |
| Software Dependencies | No | The paper mentions using the Adam optimizer, but it does not specify version numbers for any software dependencies like programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | Specifically, the number of stacked blocks is 3, the hidden size is 512, and feed-forward filter size is 2048 with a 0.2 dropout rate. During training, we train the UAIC model for 15 epochs with an initial learning rate of 3e-5 and decay it by 0.9 every five epochs with the combined loss presented in Equation 9 (He et al. 2019). Adam (Kingma and Ba 2014) optimizer with 3000 steps warm-up trick is employed. |