reproducibilityindex.ai

Fine-Grained Semantically Aligned Vision-Language Pre-Training

Authors: Juncheng Li, XIN HE, Longhui Wei, Long Qian, Linchao Zhu, Lingxi Xie, Yueting Zhuang, Qi Tian, Siliang Tang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that LOUPE achieves state-of-the-art performance on a variety of vision-language tasks.
Researcher Affiliation	Collaboration	1 Zhejiang University, 2 Huawei Cloud
Pseudocode	No	The paper describes its methods in detail through text and mathematical equations, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The repository of this work is at https://github. com/YYJMJC/LOUPE.
Open Datasets	Yes	We compare LOUPE on the widely used MSCOCO [27] and Flickr30K [33] datasets. We compare LOUPE with CLIP on 11 downstream classification datasets... For object detection, we evaluate their mean Average Precision (m AP) at Io U thresholds of {0.3, 0.5} on COCO [27] (65 classes) and PASCAL VOC [11] (20 classes). For visual grounding, we evaluate their top-1 accuracy at Io U thresholds of 0.5 on Ref COCO [51] and Ref COCO+ [51].
Dataset Splits	Yes	For visual grounding, we evaluate their top-1 accuracy at Io U thresholds of 0.5 on Ref COCO [51] and Ref COCO+ [51]. The experiment details of CLIP variants and LOUPE are provided in Appendix E. (Table 3 shows 'val test A test B' columns for Ref COCO, indicating a validation set was used for evaluation/reporting. Appendix E also states 'We follow the official split for each dataset and report the standard metrics.')
Hardware Specification	Yes	We pre-train the model for 20 epochs using a batch size of 512 on 128 NVIDIA V100 GPUs.
Software Dependencies	No	The paper mentions the use of specific models (Swin-L, BERT-Small) and an optimizer (AdamW), but does not provide specific version numbers for general software dependencies like programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or CUDA libraries.
Experiment Setup	Yes	We pre-train the model for 20 epochs using a batch size of 512 on 128 NVIDIA V100 GPUs. We utilize Adam W [29] optimizer with a learning rate of 2 10 4 and a weight decay of 0.01.