Relational Distant Supervision for Image Captioning without Image-Text Pairs
Authors: Yayun Qi, Wentian Zhao, Xinxiao Wu
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Promising results on three datasets show that our method outperforms the state-of-the-art methods of unsupervised image captioning. Experiments Datasets To evaluate the effectiveness of our method, we conduct experiments on three different datasets |
| Researcher Affiliation | Academia | 1Beijing Key Laboratory of Intelligent Information Technology, School of Computer Science & Technology, Beijing Institute of Technology, China 2 Guangdong Laboratory of Machine Perception and Intelligent Computing, Shenzhen MSU-BIT University, China |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement about the release of its source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | To evaluate the effectiveness of our method, we conduct experiments on three different datasets: (1) COCO-Shutterstock with images from COCO (Lin et al. 2014) and sentence corpus from Shutterstock (Feng et al. 2019); (2) Flickr30K-COCO with images from Flickr30K (Plummer et al. 2015) and sentence corpus from COCO; (3) COCOGCC with images from COCO and sentence corpus from GCC (Sharma et al. 2018). |
| Dataset Splits | No | The paper discusses training and evaluation but does not explicitly provide specific training/validation/test dataset splits (percentages, sample counts, or explicit predefined splits) for reproducibility. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., CPU, GPU models, memory, or cloud instance types) used to run the experiments. |
| Software Dependencies | No | The paper lists several software tools and frameworks used (e.g., 'Stanford Core NLP toolkit', 'NLTK toolkit', 'Glove', 'Transformer', 'Res Net-101', 'Faster R-CNN detector', 'Adam optimizer'), but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | In the relationship learning module, the hidden layer dimensions of MLP and the Transformer encoder are set to 1024, and the layer number Nrec of the Transformer encoder is set to 2. In the relationship-to-sentence module, the hidden layer dimensions of GCN and Transformer are set to 512. The layer number Nenc and Ndec of the Transformer encoder and decoder are both set to 6. In the image captioning module, the word embedding dimension and the hidden layer dimension are both set to 512. The Adam optimizer (Kingma and Ba 2014) is adopted. The learning rates are set to 5 10 5, 1 10 4 and 1 10 4 for the relationship learning module, the relationship-to-sentence module, and the image captioning module, respectively. |