Non-Autoregressive Image Captioning with Counterfactuals-Critical Multi-Agent Learning
Authors: Longteng Guo, Jing Liu, Xinxin Zhu, Xingjian He, Jie Jiang, Hanqing Lu
IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on MSCOCO image captioning benchmark show that our NAIC model achieves a performance comparable to state-of-the-art autoregressive models, while brings 13.9 decoding speedup. |
| Researcher Affiliation | Academia | 1National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences 2School of Artificial Intelligence, University of Chinese Academy of Sciences |
| Pseudocode | No | No structured pseudocode or algorithm blocks are present. |
| Open Source Code | No | No explicit statement or link providing access to source code for the described methodology. |
| Open Datasets | Yes | MSCOCO dataset. MSCOCO [Chen et al., 2015] is the most popular benchmark for image captioning. We use the Karpathy splits [Karpathy and Feifei, 2015] that have been used extensively for reporting results in prior works. |
| Dataset Splits | Yes | This split contains 113,287 training images with 5 captions each, and 5,000 images for validation and test splits, respectively. |
| Hardware Specification | Yes | Latency is the time to decode a single image without minibatching, averaged over the whole test split, and is tested on a Ge Force GTX 1080 Ti GPU. |
| Software Dependencies | No | No specific software dependencies with version numbers are provided (e.g., specific deep learning frameworks like PyTorch or TensorFlow with their versions). |
| Experiment Setup | Yes | Both our NAIC and AIC models closely follow the same model hyper-parameters as Transformer-Base [Vaswani et al., 2017] model. Specifically, the number of stacked blocks L is 6. ... At this training stage, we use an initial learning rate of 7.5 10 5 and decay it by 0.8 every 10 epochs. Both training stages use Adam [Kingma and Ba, 2014] optimizer with a batch size of 50. |