reproducibilityindex.ai

Partially Non-Autoregressive Image Captioning

Authors: Zhengcong Fei1309-1316

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on MS COCO benchmark demonstrate that our proposed method achieves more than 3.5 speedup while maintaining competitive performance.
Researcher Affiliation	Academia	Zhengcong Fei1,2 1Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China 2University of Chinese Academy of Sciences, Beijing 100049, China feizhengcong@ict.ac.cn
Pseudocode	No	The paper describes algorithms and methods in text and uses mathematical formulas, but it does not include a clearly labeled pseudocode block or algorithm figure.
Open Source Code	Yes	The source code is publicly released on https://github.com/feizc/PNAIC.
Open Datasets	Yes	MS COCO (Chen et al. 2015) is a standard benchmark for the image captioning task. We use the Karpathy split (Karpathy and Fei-Fei 2015) that have been employed extensively for reporting results in prior works.
Dataset Splits	Yes	This split contains 113,287 training images equipped with ﬁve sentences each, and 5,000 images for validation and test splits, respectively.
Hardware Specification	Yes	Latency represents the time to decode a single image averaged over the whole test split, and is tested on a Ge Force GTX 1080 Ti GPU.
Software Dependencies	No	The paper mentions using a 'Transformer model' and 'PyTorch' implicitly (via citations to Transformer-based models and general deep learning practices) but does not provide specific version numbers for any software dependencies required to reproduce the experiments.
Experiment Setup	Yes	For model hyperparameters, we follow most of settings in (Vaswani et al. 2017). Speciﬁcally, utilizing a base Transformer model (dmodel = 512, dh = 512, nlayer = 6, nhead = 8, pdropout = 0.1) and linearly anneal the learning rate from 3 10 4 to 10 5. The AIC model is trained ﬁrst with XE loss and then with SCST (Rennie et al. 2017). For PNAIC, we utilize the sequence-level distillation (Kim and Rush 2016; Zhou, Neubig, and Gu 2019), which replaces the target sentences in the training dataset with sentences generated by the AIC model, and set the beam size of the technique to 3.