Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Uncertainty-Aware Image Captioning
Authors: Zhengcong Fei, Mingyuan Fan, Li Zhu, Junshi Huang, Xiaoming Wei, Xiaolin Wei
AAAI 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on the MS COCO benchmark reveal that our approach outperforms the strong baseline and related methods on both captioning quality as well as decoding speed. |
| Researcher Affiliation | Industry | Meituan Beijing, China EMAIL EMAIL |
| Pseudocode | Yes | Algorithm 1: DP-based Training Data Pair Construction |
| Open Source Code | Yes | In particular, to improve reproducibility and foster new research in the field, we publicly release the source code and trained models of all experiments. |
| Open Datasets | Yes | Dataset. We evaluate our proposed method on MS COCO (Chen et al. 2015), which is a standard benchmark for image captioning tasks. To be consistent with previous works, (Huang et al. 2019; Cornia et al. 2020), we adopted the Karpathy split (Karpathy and Fei-Fei 2015) that contains 113,287 training images equipped with five humanannotated sentences each and 5,000 images for validation and test splits, respectively. |
| Dataset Splits | Yes | To be consistent with previous works, (Huang et et al. 2019; Cornia et al. 2020), we adopted the Karpathy split (Karpathy and Fei-Fei 2015) that contains 113,287 training images equipped with five humanannotated sentences each and 5,000 images for validation and test splits, respectively. |
| Hardware Specification | Yes | The decoding time for speedup estimation is measured on a single image without minibatching and feature extraction, averaged over the whole test split with a 32G V100 GPU. |
| Software Dependencies | No | The paper mentions using the Adam optimizer, but it does not specify version numbers for any software dependencies like programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | Specifically, the number of stacked blocks is 3, the hidden size is 512, and feed-forward filter size is 2048 with a 0.2 dropout rate. During training, we train the UAIC model for 15 epochs with an initial learning rate of 3e-5 and decay it by 0.9 every five epochs with the combined loss presented in Equation 9 (He et al. 2019). Adam (Kingma and Ba 2014) optimizer with 3000 steps warm-up trick is employed. |