Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
MsRAG: Knowledge Augumented Image Captioning with Object-level Multi-source RAG
Authors: Yuming Qiao, Yuechen Wang, Dan Meng, Haonan Lu, Zhenyu Yang, Xudong Zhang
IJCAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To validate the effectiveness of Ms RAG, we conducted a series of qualitative and quantitative experiments. The evaluation results demonstrate the superiority of Ms RAG over other methods. |
| Researcher Affiliation | Collaboration | Yuming Qiao1 , Yuechen Wang1 , Dan Meng1, , Haonan Lu2 , Zhenyu Yang2 , Xudong Zhang1 1OPPO Research Institute 2OPPO AI Center EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes the Ms RAG framework and its components (Parallel Visual Search Module, Prompt Templates Pool, Visual-RAG Alignment Module) through descriptive text and architectural diagrams (Fig. 2, Fig. 3), but does not include explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing source code, nor does it provide any links to a code repository. |
| Open Datasets | Yes | We evaluate Ms RAG on LVLMs using three datasets: Cap Fusion, Kale, and KAC-dataset. Cap Fusion and Kale are public captioning datasets with real-world knowledge, aligning well with the knowledge-augmented captioning task, effectively testing Ms RAG s retrieval and utilization of external information without queries. [Yu et al., 2024] [Awadalla et al., 2024] |
| Dataset Splits | No | The paper introduces the KAC-dataset and mentions using Cap Fusion and Kale for evaluation but does not specify any training, validation, or test dataset splits (e.g., percentages, sample counts, or specific predefined splits). |
| Hardware Specification | Yes | All experiments were run on two Nvidia A100s. |
| Software Dependencies | No | For closed-source models (GPT-4o, Claude), we use their APIs; for open-source models, we deploy them with vllm[Kwon et al., 2023]. Specific version numbers for software dependencies are not provided. |
| Experiment Setup | No | The paper describes the overall Ms RAG framework and mentions integrating various LVLMs (GPT-4o, Claude-3.5-Sonnet, Qwen2-VL, and Intern VL2), but it does not provide specific experimental setup details such as hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed training configurations. |