Recurrent Relational Memory Network for Unsupervised Image Captioning
Authors: Dan Guo, Yang Wang, Peipei Song, Meng Wang
IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally validate the superiority of R2M than state-of-the-arts on all benchmark datasets. 3 Experiments 3.1 Dataset and Metrics 3.2 Implementation Details 3.3 Experimental Results and Analysis |
| Researcher Affiliation | Academia | Key Laboratory of Knowledge Engineering with Big Data (HFUT), Ministry of Education School of Computer Science and Information Engineering, Hefei University of Technology (HFUT) {guodan, yangwang}@hfut.edu.cn, {beta.songpp, eric.mengwang}@gmail.com |
| Pseudocode | No | The paper describes methods with equations and figures but does not contain a formally labeled pseudocode or algorithm block. |
| Open Source Code | No | The paper mentions 'With the released code of UC-GAN [Feng et al., 2019]', referring to a baseline's code, but does not provide a link or explicit statement for their own R2M code. |
| Open Datasets | Yes | We test all the existing unsupervised image captioning datasets, including (1) MSCOCO images [Lin et al., 2014] paired with Shutterstock captions [Feng et al., 2019]; and (2) Flickr30k images [Young et al., 2014] paired with MSCOCO captions and (3) MSCOCO images paired with Google s Conceptual Captions (GCC) [Sharma et al., 2018; Laina et al., 2019]. Open Images-v4 [Krasin et al., 2017; Kuznetsova et al., 2018] |
| Dataset Splits | No | The paper mentions 'In the test splits of datasets' and discusses training stages, but it does not specify explicit train/validation/test split percentages or sample counts for the datasets used. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions software components like 'Adam optimizer', 'Faster R-CNN', 'LSTM', and 'Inception-V4', but it does not provide specific version numbers for these software packages or any other ancillary software components. |
| Experiment Setup | Yes | The margin in Eq. 12 is m 0.2. Adam optimizer is adopted with batch size of 256. For three datasets, hyper-parameters (β, γ) are set to (1, 1), (1, 1), (0.2, 0.2). We train the model with a loss LXE under learning rate 10 4, while fine-tune it with the joint loss LS. After that, LI M is used to train with a learning rate 10 5. Finally, we jointly train the model with LI. In the test process, we use the beam search tactic [Anderson et al., 2017] with width of 3. The visual dictionary D in Fig.2 is collected by a pre-trained Faster R-CNN [Huang et al., 2017] Open Images-v4... The vocabulary sizes of the three datasets are 18,679/11,335/10,652... For experimental setting, we filter out visual concepts form images with the detected score ě 0.3. Both the sizes of LSTM and RM memory are set to N 1 and d 512. The parameters of multi-head self-attention are H 2, dk d K 256, and dv d V 256. |