reproducibilityindex.ai

Dual-level Collaborative Transformer for Image Captioning

Authors: Yunpeng Luo, Jiayi Ji, Xiaoshuai Sun, Liujuan Cao, Yongjian Wu, Feiyue Huang, Chia-Wen Lin, Rongrong Ji2286-2293

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To validate our model, we conduct extensive experiments on the highly competitive MS-COCO dataset, and achieve new state-of-the-art performance on both local and online test sets, i.e., 133.8% CIDEr on Karpathy split and 135.4% CIDEr on the ofﬁcial split.
Researcher Affiliation	Collaboration	Yunpeng Luo1, Jiayi Ji1, Xiaoshuai Sun1*, Liujuan Cao1, Yongjian Wu3, Feiyue Huang3, Chia-Wen Lin4, Rongrong Ji1,2 1 Media Analytics and Computing Lab, School of Informatics, Xiamen University 2 Institute of Artiﬁcial Intelligence, Xiamen University 3 Tencent Youtu Lab 4 National Tsing Hua University
Pseudocode	No	The paper provides mathematical formulations and descriptions of its components but does not include structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statements about the release of its source code or a link to a code repository.
Open Datasets	Yes	We conduct our experiments on the benchmark image captioning dataset COCO (Lin et al. 2014).
Dataset Splits	Yes	For ofﬂine evaluation, we follow the widely adopted Karpathy split (Karpathy and Fei-Fei 2015), where 113,287, 5,000, 5,000 images are used for training, validation, and testing respectively.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, or cloud instance specifications) used for running the experiments. It only refers to model architectures like Res Net-101.
Software Dependencies	No	The paper mentions software components such as 'Adam optimizer' and 'Faster RCNN' but does not provide specific version numbers for any libraries, frameworks, or other software dependencies required to replicate the experiment.
Experiment Setup	Yes	In our implementation, we set dmodel to 512 and the number of heads to 8. The number of layers for both encoder and decoder is set to 3. In the XE pre-training stage, we warm up our model for 4 epochs with the learning rate linearly increased to 1 10 4. Then we set the learning rate to 1 10 4 between 5 10 epoches, 2 10 6 between 11 12 epoches, 4 10 7 afterwards. The batch size is set to 50. After the 18-epoch XE pre-training stage, we start to optimize our model with CIDEr reward with 5 10 6 learning rate and 100 batch size. We use Adam optimizer in both stages and the beam size is set to 5.