reproducibilityindex.ai

S2 Transformer for Image Captioning

Authors: Pengpeng Zeng, Haonan Zhang, Jingkuan Song, Lianli Gao

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on the MSCOCO benchmark demonstrate that our method achieves new state-of-art performance without bringing excessive parameters compared with the vanilla transformer.
Researcher Affiliation	Academia	School of Computer Science and Engineering and Shenzhen Institute for Advanced Study, University of Electronic Science and Technology of China, Chengdu, China.
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	The source code is available at https://github.com/zchoi/S2Transformer.
Open Datasets	Yes	We conduct experiments to verify the effectiveness of our proposed S2 Transformer on commonlyused image captioning dataset, i.e., MS-COCO.
Dataset Splits	Yes	In ofﬂine testing, we follow the setting in [Karpathy and Fei-Fei, 2015], where 113,287 images, 5,000 images, and 5,000 images are used as train, validation, and test set, respectively.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers.
Experiment Setup	Yes	In practice, our encoder and decoder both have 3 layers, where each layer uses 8 self-attention heads and the inner dimension of FFN is 2,048. The number of cluster centers N is 5 and the hyper-parameter λ = 0.2 in Eq. 9. We employ Adam optimizer to train all models and set batch size as 50. For cross-entropy (CE) training, we set the minimum epoch as 15... For selfcritical sequence training, the learning rate is ﬁxed at 5e-7.