OVIS: Open-Vocabulary Visual Instance Search via Visual-Semantic Aligned Representation Learning
Authors: Sheng Liu, Kevin Lin, Lijuan Wang, Junsong Yuan, Zicheng Liu1773-1781
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments on the two datasets, we demonstrate Vi SA s ability to search for visual instances in images not available during training given a wide range of textual queries including those composed of uncommon words. Experimental results show that Vi SA achieves an m AP@50 of 27.8% on OVIS40 and achieves a recall@30 of 21.3% on OVIS1400 dataset under the most challenging settings. |
| Researcher Affiliation | Collaboration | Sheng Liu1, Kevin Lin2, Lijuan Wang2, Junsong Yuan1, Zicheng Liu2 1University at Buffalo 2Microsoft {sliu66, jsyuan}@buffalo.edu, {keli, lijuanw, zliu}@microsoft.com |
| Pseudocode | No | The paper includes diagrams to illustrate the model and training process, but it does not contain any formal pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code for the described methodology or a link to a code repository. |
| Open Datasets | Yes | We use three image captioning datasets, i.e., Conceptual Captions (CC) (Sharma et al. 2018), SBU Captions (Ordonez, Kulkarni, and Berg 2011) COCO Captions (Lin et al. 2014) to train our model (for MTP). [...] We also use 98K images with a set of 1, 600 categories of visual instance label annotations from Visual Genome (Krishna et al. 2017) to train our model (for ILP). |
| Dataset Splits | No | The paper mentions using Conceptual Captions, SBU Captions, COCO Captions, and Visual Genome for training, and OVIS40/OVIS1400 for evaluation, but it does not specify any explicit train/validation/test splits for these datasets or reference standard splits for reproducibility. |
| Hardware Specification | No | The paper does not provide any specific hardware details (e.g., GPU models, CPU types, or cloud computing instances) used for running the experiments. |
| Software Dependencies | No | The paper mentions BERT-Base, Adam W optimizer, and Faster R-CNN, but does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | We train our Vi SA model for 30 epochs with a batch size of 512 using Adam W optimizer (Loshchilov and Hutter 2019). The learning rate is set to 0.00001. |