Exploring and Distilling Cross-Modal Information for Image Captioning
Authors: Fenglin Liu, Xuancheng Ren, Yuanxin Liu, Kai Lei, Xu Sun
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experiments on COCO image captioning dataset validate our argument and prove the effectiveness of the proposed approach. |
| Researcher Affiliation | Academia | 1Shenzhen Key Lab for Information Centric Networking & Blockchain Technology (ICNLAB), School of Electronics and Computer Engineering (SECE), Peking University 2MOE Key Laboratory of Computational Linguistics, School of EECS, Peking University 3School of ICE, Beijing University of Posts and Telecommunications |
| Pseudocode | No | The paper describes the model architecture and equations but does not contain a structured pseudocode or algorithm block. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | Yes | We evaluate the proposed approach on the widely-used COCO dataset [Chen et al., 2015], which contains 123,287 images. |
| Dataset Splits | Yes | We use the publicly-available splits in [Karpathy and Li, 2015] for offline evaluation. There are 5,000 images each in validation set and test set for COCO. |
| Hardware Specification | Yes | Time and Speed is measured on a single NVIDIA Ge Force GTX 1080 Ti. |
| Software Dependencies | No | The paper mentions general software components (e.g., Python, PyTorch, TensorFlow) but does not provide specific version numbers for these or other libraries/solvers. |
| Experiment Setup | Yes | The word embedding size and model size are 256 and 512, respectively, and in implementation, we share the attribute embedding and the input word embedding. The number of heads n in multi-head attention is set to 8 unless otherwise stated. We train the model with both cross-entropy loss and reinforcement learning optimizing CIDEr. The model is trained with batch size of 80 for 25 epochs with early stopping based on CIDEr with cross-entropy loss, followed by reinforcement learning. We use Adam [Kingma and Ba, 2014] with a learning rate of 10 4 for parameter optimization. We also apply beam search with beam size = 3 during inference. |