reproducibilityindex.ai

Improving Image Captioning with Conditional Generative Adversarial Nets

Authors: Chen Chen, Shuai Mu, Wanpeng Xiao, Zexiong Ye, Liesi Wu, Qi Ju8142-8150

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we show consistent improvements over all language evaluation metrics for different state-of-the-art image captioning models.
Researcher Affiliation	Industry	Chen Chen, Shuai Mu, Wanpeng Xiao, Zexiong Ye, Liesi Wu, Qi Ju Tencent AI Lab, Shenzhen, China 518000 {beckhamchen, harrymou, wanpengxiao, joeyye, henrylwu, damonju}@tencent.com
Pseudocode	Yes	Algorithm 1 describes the image captioning algorithm via the generative adversarial training method in detail.
Open Source Code	No	The paper does not explicitly state that the source code for the described methodology is released or provide a link to a code repository.
Open Datasets	Yes	The most widely used image captioning training and evaluation dataset is the MSCOCO dataset (Lin et al. 2014) which contains 82,783, 40,504, and 40,775 images with 5 captions each for training, validation, and test, respectively. For ofﬂine evaluation, following the Karpathy splits from (Karpathy and Fei-Fei 2015), we use a set of 5K images for validation, 5K images for test and 113,287 images for training.
Dataset Splits	Yes	The most widely used image captioning training and evaluation dataset is the MSCOCO dataset (Lin et al. 2014) which contains 82,783, 40,504, and 40,775 images with 5 captions each for training, validation, and test, respectively. For ofﬂine evaluation, following the Karpathy splits from (Karpathy and Fei-Fei 2015), we use a set of 5K images for validation, 5K images for test and 113,287 images for training.
Hardware Specification	No	The paper does not provide specific details about the hardware used, such as GPU or CPU models. It mentions 'training stage' and 'experimental experience' but no hardware specifications.
Software Dependencies	No	The paper mentions 'ADAM (Kingma and Ba 2014) optimizer' but does not provide specific version numbers for any software, libraries, or dependencies used in the experiments.
Experiment Setup	Yes	The LSTM hidden dimension for the RNN-based discriminator is 512. The dimension of image CNN feature and word embedding for both CNN-based and RNN-based discriminators is ﬁxed to 2048. We initialize the discriminator via pre-training the model for 10 epochs by minimizing the cross entropy loss in Eq. (12) using the ADAM (Kingma and Ba 2014) optimizer with a batch size of 16, an initial learning rate of 1 10 3 and momentum of 0.9 and 0.999. Similarly, the generator is also pre-trained by MLE for 25 epochs. We use a beam search with a beam size of 5 when validating and tesing. The ﬁnal optimal hyper-parameters of our proposed algorithm are λ = 0.3, g = 1, d = 1 and Q = CIDEr-D.