Non-Autoregressive Image Captioning with Counterfactuals-Critical Multi-Agent Learning

Authors: Longteng Guo, Jing Liu, Xinxin Zhu, Xingjian He, Jie Jiang, Hanqing Lu

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on MSCOCO image captioning benchmark show that our NAIC model achieves a performance comparable to state-of-the-art autoregressive models, while brings 13.9 decoding speedup.
Researcher Affiliation Academia 1National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences 2School of Artificial Intelligence, University of Chinese Academy of Sciences
Pseudocode No No structured pseudocode or algorithm blocks are present.
Open Source Code No No explicit statement or link providing access to source code for the described methodology.
Open Datasets Yes MSCOCO dataset. MSCOCO [Chen et al., 2015] is the most popular benchmark for image captioning. We use the Karpathy splits [Karpathy and Feifei, 2015] that have been used extensively for reporting results in prior works.
Dataset Splits Yes This split contains 113,287 training images with 5 captions each, and 5,000 images for validation and test splits, respectively.
Hardware Specification Yes Latency is the time to decode a single image without minibatching, averaged over the whole test split, and is tested on a Ge Force GTX 1080 Ti GPU.
Software Dependencies No No specific software dependencies with version numbers are provided (e.g., specific deep learning frameworks like PyTorch or TensorFlow with their versions).
Experiment Setup Yes Both our NAIC and AIC models closely follow the same model hyper-parameters as Transformer-Base [Vaswani et al., 2017] model. Specifically, the number of stacked blocks L is 6. ... At this training stage, we use an initial learning rate of 7.5 10 5 and decay it by 0.8 every 10 epochs. Both training stages use Adam [Kingma and Ba, 2014] optimizer with a batch size of 50.