Patch-level Contrastive Learning via Positional Query for Visual Pre-training
Authors: Shaofeng Zhang, Qiang Zhou, Zhibin Wang, Fan Wang, Junchi Yan
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct comprehensive experiments on standard visual benchmarks, including linear probing, finetuning on classification, detection and segmentation, where the proposed PQCL can stably improve the baseline DINO and i BOT a lot. |
| Researcher Affiliation | Collaboration | Shaofeng Zhang 1 Qiang Zhou 2 Zhibin Wang 2 Fan Wang 2 Junchi Yan 1 1Department of CSE, and Mo E Key Lab of Artificial Intelligence, Shanghai Jiao Tong University 2Alibaba Group. |
| Pseudocode | No | The paper describes the method using equations and figures, but no explicit pseudocode or algorithm blocks are provided. |
| Open Source Code | Yes | Code is available at https://github. com/Sherrylone/Query_Contrastive. |
| Open Datasets | Yes | We conduct self-supervised pre-training on the Image Net-1K (Deng et al., 2009) training set with 1,000 classes... We also transfer the encoder pre-trained by PQCL on MS-COCO (Lin et al., 2014), ADE20K (Zhou et al., 2017), and video segmentation dataset DAVIS 2017 (Pont-Tuset et al., 2017). |
| Dataset Splits | No | The paper utilizes well-known datasets like ImageNet-1K, MS-COCO, and ADE20K, which have standard splits. However, it does not explicitly state the specific percentages, sample counts, or methodology used for the training/validation/test splits in the context of its own experiments, nor does it explicitly cite which predefined validation splits were used. |
| Hardware Specification | Yes | The experiments are performed on a work station with 16 V100 GPUs by default (if not otherwise specified). |
| Software Dependencies | No | The paper mentions optimizers like Adamw, but does not provide specific version numbers for software dependencies such as programming languages, deep learning frameworks (e.g., PyTorch, TensorFlow), or other libraries used for the experiments. |
| Experiment Setup | Yes | We train with Adamw... and a batch size of 1024... The learning rate is linearly ramped up during the first 30 epochs to its base value... lr = 0.0005, batchsize=256. After warmup, we decay the learning rate with a cosine schedule... The weight decay also follows a cosine scheduled from 0.04 to 0.4. The temperature τ is set to 0.04 while we use a linear warm-up for τt from 0.04 to 0.07 during the first 30 epochs... For both two baselines DINO and i BOT, the query crop ratio is randomly sampled from 0.05 0.25... For baseline i BOT, we set the masked ratio of global views as 0.3. |