reproducibilityindex.ai

Attention-Aligned Transformer for Image Captioning

Authors: Zhengcong Fei607-615

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments conducted on the MS COCO dataset demonstrate that the proposed A2 Transformer consistently outperforms baselines in both automatic metrics and human evaluation.
Researcher Affiliation	Academia	1Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China 2University of Chinese Academy of Sciences, Beijing 100049, China feizhengcong@ict.ac.cn
Pseudocode	No	The paper describes methods using natural language and mathematical equations but does not include any explicit pseudocode or algorithm blocks.
Open Source Code	Yes	Trained models and code for reproducing the experiments are publicly available.
Open Datasets	Yes	All the experiments are conducted on the most popular image captioning dataset MS COCO (Chen et al. 2015).
Dataset Splits	Yes	We follow the common practice as Karpathy splits (Karpathy and Fei-Fei 2015) for validation of model hyperparameters and ofﬂine evaluation. This split contains 113,287 images for training and 5,000 respectively for validation and test.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory used for running the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) that are needed to replicate the experiments.
Experiment Setup	Yes	We set the dimensionality d of each layer to 512 and the number of heads to 8. We employ a dropout rate of 0.1 after each attention and feed-forward layer. Model is ﬁrst trained to minimize the negative log-likelihood of the training data following the learning rate scheduling strategy with a warmup equal to 10,00, and then ﬁne-tuned with the CIDEr score using Reinforcement Learning (Rennie et al. 2017) with a ﬁxed learning rate of 5 10 6. We train all models using the Adam optimizer (Kingma and Ba 2014), a batch size of 50 and a beam size of 5. We set the hyperparameter η = 0.1 in Equation 7 in all experiments.