Aesthetically Relevant Image Captioning
Authors: Zhipeng Zhong, Fei Zhou, Guoping Qiu
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present extensive experimental results to show the soundness of the ARS concept and the effectiveness of the ARIC model by demonstrating that texts with higher ARS s can predict the aesthetic ratings more accurately and that the new ARIC model can generate more accurate, aesthetically more relevant and more diverse image captions. |
| Researcher Affiliation | Academia | 1College of Electronics and Information Engineering, Shenzhen University, China 2Peng Cheng National Laboratory, Shenzhen, China 3Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen, China 4Shenzhen Institute for Artificial Intelligence and Robotics for Society, China 5Guangdong-Hong Kong Joint Laboratory for Big Data Imaging and Communication, Shenzhen, China 6School of Computer Science, The University of Nottingham, UK |
| Pseudocode | No | The paper describes its models and methods but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Furthermore, a large new research database containing 510K images with over 5 million comments and 350K aesthetic scores, and code for implementing ARIC are available at https://github.com/Peng Zai/ARIC. |
| Open Datasets | Yes | Furthermore, a large new research database containing 510K images with over 5 million comments and 350K aesthetic scores, and code for implementing ARIC are available at https://github.com/Peng Zai/ARIC. |
| Dataset Splits | Yes | We divide Set B into a test set containing 106,971 images and a validation set containing 10,698 images, and the remaining 232,331 images are used as the training set. |
| Hardware Specification | Yes | All experiments were performed on a machine with 4 NVIDIA A100 GPUs. |
| Software Dependencies | No | The paper mentions software components like 'GPT2 model', 'spa Cy', 'TEXTCNN', 'TEXTRCNN', 'BERT', 'ROBERTA', and 'CLIP' but does not specify their version numbers or other key software dependencies with versions. |
| Experiment Setup | Yes | Adam optimizer with a learning rate of 2e 5 without weight decay was used. For IAC, we used the pretrained GPT2 model and set token size to 64. All experiments were performed on a machine with 4 NVIDIA A100 GPUs. Finally, we limit the number of tokens for each image s comment to 512. |