CIC: A Framework for Culturally-Aware Image Captioning

Authors: Youngsik Yun, Jihie Kim

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our human evaluation conducted on 45 participants from 4 different cultural groups with a high understanding of the corresponding culture shows that our proposed framework generates more culturally descriptive captions when compared to the image captioning baseline based on VLPs. Resources can be found at https://shane3606.github. io/cic.
Researcher Affiliation Academia Youngsik Yun 1 and Jihie Kim 2 1 Department of Computer Science and Artificial Intelligence, Dongguk University 2 Division of AI Software Convergence, Dongguk University
Pseudocode No The paper describes the method using textual descriptions and a diagram, but it does not include pseudocode or a clearly labeled algorithm block.
Open Source Code Yes Resources can be found at https://shane3606.github. io/cic.
Open Datasets Yes We validated our framework using GD-VCR [Yin et al., 2021], a multiple-choice QA testing set designed to evaluate the ability of multi-modal models to understand geo-diverse commonsense knowledge.
Dataset Splits Yes We validated our framework using GD-VCR [Yin et al., 2021], a multiple-choice QA testing set designed to evaluate the ability of multi-modal models to understand geo-diverse commonsense knowledge.
Hardware Specification No The paper does not provide specific details about the hardware used, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions Chat GPT and BLIP2 but does not specify their version numbers or other software dependencies with explicit versions.
Experiment Setup Yes The temperature is set to 0.6, and the maximum length for caption generation is 100.