Recurrent Relational Memory Network for Unsupervised Image Captioning

Authors: Dan Guo, Yang Wang, Peipei Song, Meng Wang

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimentally validate the superiority of R2M than state-of-the-arts on all benchmark datasets. 3 Experiments 3.1 Dataset and Metrics 3.2 Implementation Details 3.3 Experimental Results and Analysis
Researcher Affiliation Academia Key Laboratory of Knowledge Engineering with Big Data (HFUT), Ministry of Education School of Computer Science and Information Engineering, Hefei University of Technology (HFUT) {guodan, yangwang}@hfut.edu.cn, {beta.songpp, eric.mengwang}@gmail.com
Pseudocode No The paper describes methods with equations and figures but does not contain a formally labeled pseudocode or algorithm block.
Open Source Code No The paper mentions 'With the released code of UC-GAN [Feng et al., 2019]', referring to a baseline's code, but does not provide a link or explicit statement for their own R2M code.
Open Datasets Yes We test all the existing unsupervised image captioning datasets, including (1) MSCOCO images [Lin et al., 2014] paired with Shutterstock captions [Feng et al., 2019]; and (2) Flickr30k images [Young et al., 2014] paired with MSCOCO captions and (3) MSCOCO images paired with Google s Conceptual Captions (GCC) [Sharma et al., 2018; Laina et al., 2019]. Open Images-v4 [Krasin et al., 2017; Kuznetsova et al., 2018]
Dataset Splits No The paper mentions 'In the test splits of datasets' and discusses training stages, but it does not specify explicit train/validation/test split percentages or sample counts for the datasets used.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions software components like 'Adam optimizer', 'Faster R-CNN', 'LSTM', and 'Inception-V4', but it does not provide specific version numbers for these software packages or any other ancillary software components.
Experiment Setup Yes The margin in Eq. 12 is m 0.2. Adam optimizer is adopted with batch size of 256. For three datasets, hyper-parameters (β, γ) are set to (1, 1), (1, 1), (0.2, 0.2). We train the model with a loss LXE under learning rate 10 4, while fine-tune it with the joint loss LS. After that, LI M is used to train with a learning rate 10 5. Finally, we jointly train the model with LI. In the test process, we use the beam search tactic [Anderson et al., 2017] with width of 3. The visual dictionary D in Fig.2 is collected by a pre-trained Faster R-CNN [Huang et al., 2017] Open Images-v4... The vocabulary sizes of the three datasets are 18,679/11,335/10,652... For experimental setting, we filter out visual concepts form images with the detected score ě 0.3. Both the sizes of LSTM and RM memory are set to N 1 and d 512. The parameters of multi-head self-attention are H 2, dk d K 256, and dv d V 256.