reproducibilityindex.ai

End-to-End Transformer Based Model for Image Captioning

Authors: Yiyu Wang, Jungang Xu, Yingfei Sun2585-2594

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To validate the effectiveness of our proposed model, we conduct experiments on MSCOCO dataset. The experimental results compared to existing published works demonstrate that our model achieves new state-of-the-art performances of 138.2% (single model) and 141.0% (ensemble of 4 models) CIDEr scores on Karpathy ofﬂine test split and 136.0% (c5) and 138.3% (c40) CIDEr scores on the ofﬁcial online test server.
Researcher Affiliation	Academia	Yiyu Wang1, Jungang Xu2*, Yingfei Sun1 1School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences 2School of Computer Science and Technology, University of Chinese Academy of Sciences
Pseudocode	No	The paper presents architectural diagrams and mathematical formulations but does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	Trained models and source code will be released.
Open Datasets	Yes	We conduct experiments on the MSCOCO 2014 dataset (Lin et al. 2014), which contains 123287 images (82783 for training and 40504 for validation), and each is annotated with 5 reference captions.
Dataset Splits	Yes	We conduct experiments on the MSCOCO 2014 dataset (Lin et al. 2014), which contains 123287 images (82783 for training and 40504 for validation)... In this paper, we follow the Karpathy split (Karpathy and Fei-Fei 2017) to redivide the MSCOCO, where 113287 images for training, 5000 images for validation and 5000 images for ofﬂine evaluation.
Hardware Specification	No	No specific hardware details (like GPU models, CPU types, or memory amounts) are mentioned in the paper.
Software Dependencies	No	The paper mentions using the "Adam optimizer" but does not specify any software libraries, frameworks, or their version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	We set the model embedding size D to 512, the number of transformer heads to 8, the number of blocks N for both reﬁning encoder and decoder to 3. For the training process, we ﬁrst train our model under XE loss LXE for 20 epochs, and set the batch size to 10 and warmup steps to 10,000; then we train our model under LR for another 30 epochs with ﬁxed learning rate of 5 10 6. We adopt Adam (Kingma and Ba 2015) optimizer in both above stages and the beam size is set to 5 in validation and evaluation process.