Open Visual Knowledge Extraction via Relation-Oriented Multimodality Model Prompting

Authors: Hejie Cui, Xinyu Fang, Zihan Zhang, Ran Xu, Xuan Kan, Xin Liu, Yue Yu, Manling Li, Yangqiu Song, Carl Yang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive knowledge quality evaluations highlight the correctness and uniqueness of the extracted open visual knowledge by Open Vik. Moreover, integrating our extracted knowledge across various visual reasoning applications shows consistent improvements, indicating the real-world applicability of Open Vik.
Researcher Affiliation Academia Hejie Cui1 Xinyu Fang2 Zihan Zhang2 Ran Xu1 Xuan Kan1 Xin Liu3 Yue Yu4 Manling Li5 Yangqiu Song3 Carl Yang1 1Emory University 2Tongji University 3 The Hong Kong University of Science and Technology 4 Georgia Institute of Technology 5 Northwestern University
Pseudocode No The paper describes its methods textually and visually (Figure 1) but does not include any explicit pseudocode or algorithm blocks.
Open Source Code No The paper does not include an explicit statement or link for the open-source code of its methodology.
Open Datasets Yes Our training data are built based on Visual Genome [24] and its relation-enhanced version Dense Relational Captioning [22].
Dataset Splits Yes The dataset statistic information is summarized in Table 8 in the Appendix B. Table 8: split #image #descriptor #relation #subject & object Train 75,456 832,351 30,241 302,735 Validation 4,871 64,137 5,164 34,177 Test 4,873 62,579 5,031 32,384
Hardware Specification Yes Our model is implemented in Py Torch [35] and trained on two Quadro RTX 8000 GPUs.
Software Dependencies No The paper states 'Our model is implemented in Py Torch [35]' but does not provide specific version numbers for PyTorch or other software dependencies.
Experiment Setup Yes Full details on learning parameters can be referred to in Appendix C. Table 9: batch size 4 learning rate optimizer Adam Adam epsilon 1e-8 Adam initial learning rate 1e-5 learning rate scheduler cosine scheduler Adam decay weight 0.05. Table 10: batch size 4 learning rate optimizer Adam Adam epsilon 1e-8 Adam initial learning rate 1e-5 learning rate scheduler cosine scheduler Adam decay weight 0.05 α 0.7 ϕ 0.01. The open relational region detector is initialized from the Res Net50-FPN backbone, then finetuned for another 20 epochs. The format-free visual knowledge generator is initialized from BLIPbase with the basic Vi T-B/16 and finetuned for 20 epochs.