reproducibilityindex.ai

GLIPv2: Unifying Localization and Vision-Language Understanding

Authors: Haotian Zhang, Pengchuan Zhang, Xiaowei Hu, Yen-Chun Chen, Liunian Li, Xiyang Dai, Lijuan Wang, Lu Yuan, Jenq-Neng Hwang, Jianfeng Gao

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that a single GLIPv2 model (all model weights are shared) achieves near So TA performance on various localization and understanding tasks.
Researcher Affiliation	Collaboration	1University of Washington, 2Meta AI, 3Microsoft, 4UCLA
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code is released at https://github.com/microsoft/GLIP.
Open Datasets	Yes	GLIPv2-T... is pre-trained on the following data: 1) O365, 2) Gold G as in GLIP-T (C), and 3) Cap4M, 4M image-text pairs collected from the web with boxes generated by GLIP-T [36]. GLIPv2-B/GLIPv2-H... training data contain: 1) Five ODs (2.78M data) 1; 2) Gold G as in MDETR [25]; and 3) CC15M+SBU, 16M public image-text data with generated boxes by GLIP-L [36]. Segmentation heads of GLIPv2 models are pre-trained on COCO, LVIS [20] and Phrase Cut [54], with all other model parameters are frozen.
Dataset Splits	Yes	For LVIS, we report the numbers for both bbox and segm on minival to avoid data contamination due to the pre-training. For COCO-Det test-dev, * indicates multi-scale evaluation. For LVIS, we report the numbers for both bbox and segm on minival to avoid data contamination due to the pre-training. For Flickr30K test, we report the metric under R@1. For COCO-Mask, we also report both bbox and segm on test-dev.
Hardware Specification	No	The paper does not specify concrete hardware details such as exact GPU or CPU models used for experiments. It only vaguely mentions 'providing computer resources for large-scale training' in the acknowledgements.
Software Dependencies	No	The paper mentions software components and architectures like Swin Transformer, BERT-Base, and Dynamic Head, but it does not provide specific version numbers for these or other software dependencies (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	No	The paper states, 'Due to limited space, we refer to supplementary for details of training recipes and hyper-parameters.' Therefore, specific experimental setup details like hyperparameter values are not provided in the main text.