Partially Non-Autoregressive Image Captioning
Authors: Zhengcong Fei1309-1316
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on MS COCO benchmark demonstrate that our proposed method achieves more than 3.5 speedup while maintaining competitive performance. |
| Researcher Affiliation | Academia | Zhengcong Fei1,2 1Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China 2University of Chinese Academy of Sciences, Beijing 100049, China feizhengcong@ict.ac.cn |
| Pseudocode | No | The paper describes algorithms and methods in text and uses mathematical formulas, but it does not include a clearly labeled pseudocode block or algorithm figure. |
| Open Source Code | Yes | The source code is publicly released on https://github.com/feizc/PNAIC. |
| Open Datasets | Yes | MS COCO (Chen et al. 2015) is a standard benchmark for the image captioning task. We use the Karpathy split (Karpathy and Fei-Fei 2015) that have been employed extensively for reporting results in prior works. |
| Dataset Splits | Yes | This split contains 113,287 training images equipped with five sentences each, and 5,000 images for validation and test splits, respectively. |
| Hardware Specification | Yes | Latency represents the time to decode a single image averaged over the whole test split, and is tested on a Ge Force GTX 1080 Ti GPU. |
| Software Dependencies | No | The paper mentions using a 'Transformer model' and 'PyTorch' implicitly (via citations to Transformer-based models and general deep learning practices) but does not provide specific version numbers for any software dependencies required to reproduce the experiments. |
| Experiment Setup | Yes | For model hyperparameters, we follow most of settings in (Vaswani et al. 2017). Specifically, utilizing a base Transformer model (dmodel = 512, dh = 512, nlayer = 6, nhead = 8, pdropout = 0.1) and linearly anneal the learning rate from 3 10 4 to 10 5. The AIC model is trained first with XE loss and then with SCST (Rennie et al. 2017). For PNAIC, we utilize the sequence-level distillation (Kim and Rush 2016; Zhou, Neubig, and Gu 2019), which replaces the target sentences in the training dataset with sentences generated by the AIC model, and set the beam size of the technique to 3. |