HASSOD: Hierarchical Adaptive Self-Supervised Object Detection
Authors: Shengcao Cao, Dhiraj Joshi, Liangyan Gui, Yu-Xiong Wang
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments on prevalent image datasets, we demonstrate the superiority of HASSOD over existing methods, thereby advancing the state of the art in self-supervised object detection. |
| Researcher Affiliation | Collaboration | Shengcao Cao1 Dhiraj Joshi2 Liang-Yan Gui1 Yu-Xiong Wang1 1University of Illinois at Urbana-Champaign 2IBM Research 1{cao44,lgui,yxw}@illinois.edu 2djoshi@us.ibm.com |
| Pseudocode | No | The paper describes procedures using descriptive text and flowcharts (e.g., Figure 2, Figure 3), but it does not contain structured pseudocode or algorithm blocks clearly labeled as 'Pseudocode' or 'Algorithm'. |
| Open Source Code | Yes | Project page: https://HASSOD-NeurIPS23.github.io. |
| Open Datasets | Yes | We train a Cascade Mask R-CNN [4] with a Res Net-50 [13] backbone on MS-COCO [20] images. The backbone is initialized from DINO [5] self-supervised pre-training... We mainly conduct our experiments in a zero-shot manner on the validation sets of three benchmark datasets, namely Objects365 [27], LVIS [11], and SA-1B [18]. |
| Dataset Splits | Yes | We train a Cascade Mask R-CNN [4] with a Res Net-50 [13] backbone on MS-COCO [20] images. We use both the train and unlabeled splits of MS-COCO, totaling to about 0.24 million images. We mainly conduct our experiments in a zero-shot manner on the validation sets of three benchmark datasets, namely Objects365 [27], LVIS [11], and SA-1B [18]. As SA-1B does not provide a validation split, we utilize a random subset of 50,000 images for our assessment. |
| Hardware Specification | Yes | The whole training process spans 40,000 iterations, taking about 20 hours on 4 NVIDIA A100 GPUs. |
| Software Dependencies | No | The paper states 'Our code is developed based on Py Torch [24] and Detectron2 [40]', but it does not specify version numbers for either PyTorch or Detectron2, nor for any other ancillary software components. |
| Experiment Setup | Yes | The whole training process starts with a burn-in stage, during which the student model is only trained on the initial pseudo-labels with a fixed learning rate 0.01 and fixed loss weights. After the burn-in stage, the teacher model is introduced, and we gradually adjust the learning rate from 0.01 to 0, the loss weight in the label-to-student branch from 1.0 to 0.0, and the loss weight in the teacher-to-student branch from 2.0 to 3.0, all following a cosine schedule. The whole training process spans 40,000 iterations with a batch size of 16 images. We resize the resolution of each image to 480 × 480... The merging process stops at three thresholds θmerge 1 = 0.4, θmerge 2 = 0.2, θmerge 3 = 0.1... The coverage threshold is set to θcover% = 90%. |