reproducibilityindex.ai

Generative Multi-Modal Knowledge Retrieval with Large Language Models

Authors: Xinwei Long, Jiali Zeng, Fandong Meng, Zhiyuan Ma, Kaiyan Zhang, Bowen Zhou, Jie Zhou

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through experiments conducted on three benchmarks, we demonstrate significant improvements ranging from 3.0% to 14.6% across all evaluation metrics when compared to strong baselines.
Researcher Affiliation	Collaboration	1Department of Electronic Engineering, Tsinghua University, Beijing, China 2Pattern Recognition Center, We Chat AI, Tencent Inc, China
Pseudocode	No	The paper describes the model's architecture and processes but does not include a dedicated pseudocode or algorithm block.
Open Source Code	No	The code will be released in this repository. https://github.com/xinwei666/MMGenerative IR
Open Datasets	Yes	We conduct experiments on three benchmarks of multi-modal knowledge retrieval: OKVQA-GS112K (Luo et al. 2021a), OKVQA-WK21M (Luo et al. 2023b) and Re Muq (Luo et al. 2023b)
Dataset Splits	Yes	Dataset Train/Val/ Test KB size OKVQA-GS112K 8,062/896/5,046 OKVQA-WK21M 8,062/896/5,046 Re Muq 7,576/842/3,609
Hardware Specification	Yes	Training is performed on an NVIDIA A6000 48G GPU and completed within three hours.
Software Dependencies	No	Our model is implemented by Pytorch and trained using a learning rate of 6e-5, the Adam optimizer with a warmup strategy, and batches of 12 instruction data... We use YOLOv7 (Wang, Bochkovskiy, and Liao 2022) to obtain bounding boxes... The paper mentions software like Pytorch and YOLOv7 but does not provide specific version numbers for them.
Experiment Setup	Yes	Our model is implemented by Pytorch and trained using a learning rate of 6e-5, the Adam optimizer with a warmup strategy, and batches of 12 instruction data.