Aesthetically Relevant Image Captioning

Authors: Zhipeng Zhong, Fei Zhou, Guoping Qiu

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present extensive experimental results to show the soundness of the ARS concept and the effectiveness of the ARIC model by demonstrating that texts with higher ARS s can predict the aesthetic ratings more accurately and that the new ARIC model can generate more accurate, aesthetically more relevant and more diverse image captions.
Researcher Affiliation Academia 1College of Electronics and Information Engineering, Shenzhen University, China 2Peng Cheng National Laboratory, Shenzhen, China 3Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen, China 4Shenzhen Institute for Artificial Intelligence and Robotics for Society, China 5Guangdong-Hong Kong Joint Laboratory for Big Data Imaging and Communication, Shenzhen, China 6School of Computer Science, The University of Nottingham, UK
Pseudocode No The paper describes its models and methods but does not include any explicit pseudocode or algorithm blocks.
Open Source Code Yes Furthermore, a large new research database containing 510K images with over 5 million comments and 350K aesthetic scores, and code for implementing ARIC are available at https://github.com/Peng Zai/ARIC.
Open Datasets Yes Furthermore, a large new research database containing 510K images with over 5 million comments and 350K aesthetic scores, and code for implementing ARIC are available at https://github.com/Peng Zai/ARIC.
Dataset Splits Yes We divide Set B into a test set containing 106,971 images and a validation set containing 10,698 images, and the remaining 232,331 images are used as the training set.
Hardware Specification Yes All experiments were performed on a machine with 4 NVIDIA A100 GPUs.
Software Dependencies No The paper mentions software components like 'GPT2 model', 'spa Cy', 'TEXTCNN', 'TEXTRCNN', 'BERT', 'ROBERTA', and 'CLIP' but does not specify their version numbers or other key software dependencies with versions.
Experiment Setup Yes Adam optimizer with a learning rate of 2e 5 without weight decay was used. For IAC, we used the pretrained GPT2 model and set token size to 64. All experiments were performed on a machine with 4 NVIDIA A100 GPUs. Finally, we limit the number of tokens for each image s comment to 512.