reproducibilityindex.ai

OVIS: Open-Vocabulary Visual Instance Search via Visual-Semantic Aligned Representation Learning

Authors: Sheng Liu, Kevin Lin, Lijuan Wang, Junsong Yuan, Zicheng Liu1773-1781

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments on the two datasets, we demonstrate Vi SA s ability to search for visual instances in images not available during training given a wide range of textual queries including those composed of uncommon words. Experimental results show that Vi SA achieves an m AP@50 of 27.8% on OVIS40 and achieves a recall@30 of 21.3% on OVIS1400 dataset under the most challenging settings.
Researcher Affiliation	Collaboration	Sheng Liu1, Kevin Lin2, Lijuan Wang2, Junsong Yuan1, Zicheng Liu2 1University at Buffalo 2Microsoft {sliu66, jsyuan}@buffalo.edu, {keli, lijuanw, zliu}@microsoft.com
Pseudocode	No	The paper includes diagrams to illustrate the model and training process, but it does not contain any formal pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code for the described methodology or a link to a code repository.
Open Datasets	Yes	We use three image captioning datasets, i.e., Conceptual Captions (CC) (Sharma et al. 2018), SBU Captions (Ordonez, Kulkarni, and Berg 2011) COCO Captions (Lin et al. 2014) to train our model (for MTP). [...] We also use 98K images with a set of 1, 600 categories of visual instance label annotations from Visual Genome (Krishna et al. 2017) to train our model (for ILP).
Dataset Splits	No	The paper mentions using Conceptual Captions, SBU Captions, COCO Captions, and Visual Genome for training, and OVIS40/OVIS1400 for evaluation, but it does not specify any explicit train/validation/test splits for these datasets or reference standard splits for reproducibility.
Hardware Specification	No	The paper does not provide any specific hardware details (e.g., GPU models, CPU types, or cloud computing instances) used for running the experiments.
Software Dependencies	No	The paper mentions BERT-Base, Adam W optimizer, and Faster R-CNN, but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup	Yes	We train our Vi SA model for 30 epochs with a batch size of 512 using Adam W optimizer (Loshchilov and Hutter 2019). The learning rate is set to 0.00001.