Attention-Aligned Transformer for Image Captioning
Authors: Zhengcong Fei607-615
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments conducted on the MS COCO dataset demonstrate that the proposed A2 Transformer consistently outperforms baselines in both automatic metrics and human evaluation. |
| Researcher Affiliation | Academia | 1Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China 2University of Chinese Academy of Sciences, Beijing 100049, China feizhengcong@ict.ac.cn |
| Pseudocode | No | The paper describes methods using natural language and mathematical equations but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Trained models and code for reproducing the experiments are publicly available. |
| Open Datasets | Yes | All the experiments are conducted on the most popular image captioning dataset MS COCO (Chen et al. 2015). |
| Dataset Splits | Yes | We follow the common practice as Karpathy splits (Karpathy and Fei-Fei 2015) for validation of model hyperparameters and offline evaluation. This split contains 113,287 images for training and 5,000 respectively for validation and test. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) that are needed to replicate the experiments. |
| Experiment Setup | Yes | We set the dimensionality d of each layer to 512 and the number of heads to 8. We employ a dropout rate of 0.1 after each attention and feed-forward layer. Model is first trained to minimize the negative log-likelihood of the training data following the learning rate scheduling strategy with a warmup equal to 10,00, and then fine-tuned with the CIDEr score using Reinforcement Learning (Rennie et al. 2017) with a fixed learning rate of 5 10 6. We train all models using the Adam optimizer (Kingma and Ba 2014), a batch size of 50 and a beam size of 5. We set the hyperparameter η = 0.1 in Equation 7 in all experiments. |