reproducibilityindex.ai

CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction

Authors: Size Wu, Wenwei Zhang, Lumin Xu, Sheng Jin, Xiangtai Li, Wentao Liu, Chen Change Loy

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The effectiveness of CLIPSelf is validated on open-vocabulary object detection and image segmentation benchmarks. For open-vocabulary object detection, we established a two-stage baseline based on frozen CLIP Vi Ts, and the fine-tuned models achieved new state-of-the-art performance on OV-COCO and OV-LVIS benchmarks, as well as on the transfer detection benchmark.
Researcher Affiliation	Collaboration	1 S-Lab, Nanyang Technological University 2 The Chinese University of Hong Kong 3 The University of Hong Kong 4 Sense Time Research and Tetras.AI 5 Shanghai AI Laboratory
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Models and code are released at https://github.com/wusize/CLIPSelf.
Open Datasets	Yes	By default, we use the images in train2017 split of COCO dataset (Lin et al., 2014), which are exactly the training images of most downstream open-vocabulary benchmarks. ... For the OV-LVIS benchmark, we use the images from the train split of LVIS v1.0 (Gupta et al., 2019).
Dataset Splits	Yes	The mean accuracy (m Acc) of classifying region boxes annotated in COCO s val2017 split is used as the indicator for evaluation.
Hardware Specification	Yes	To train CLIPSelf, we use 8 A100 GPUs and set the batch size as 2 on each GPU.
Software Dependencies	No	The paper mentions using the Adam W optimizer but does not specify version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages.
Experiment Setup	Yes	To train CLIPSelf, we use 8 A100 GPUs and set the batch size as 2 on each GPU. We train the models for 6 epochs using the Adam W (Loshchilov & Hutter, 2017) optimizer with a learning rate of 1e 5 and weight decay of 0.1.