Cloud Object Detector Adaptation by Integrating Different Source Knowledge
Authors: Shuaifeng Li, Mao Ye, Lihua Zhou, Nianxin Li, Siying Xiao, Song Tang, Xiatian Zhu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiment results demonstrate that the proposed COIN method achieves the state-of-the-art performance. |
| Researcher Affiliation | Academia | 1 University of Electronic Science and Technology of China 2 University of Shanghai for Science and Technology 3 University of Surrey |
| Pseudocode | Yes | Algorithm 1 Our proposed COIN method. |
| Open Source Code | Yes | https://github.com/Flashkong/COIN |
| Open Datasets | Yes | Specifically, we validate the effectiveness of the proposed COIN method on six object detection datasets, e.g., Cityscapes [11], Foggy-Cityscapes [11], Clipart [25], BDD100K [63], KITTI [16] and Sim10K [26]. |
| Dataset Splits | No | Cityscapes [11] consists of 2,975 training images and 500 testing images... Foggy-Cityscapes [11] contains three levels of foggy images simulated by the images of Cityscapes. 2,975 training images and 500 testing images... For comparison with existing methods, we follow [35, 14], and use 36,728 training images and 5,258 testing images with 7 classes for training and testing respectively. KITTI [16] contains 7,481 urban images with the car category. We use all the images for training and testing. Sim10K [26] contains 10K images collected from the computer game Grand Theft Auto V with the car category. All images are used for training and testing. |
| Hardware Specification | Yes | One 3090 GPU, a batch-size 3 and a random seed 2024 are used for all experiments. |
| Software Dependencies | No | One 3090 GPU, a batch-size 3 and a random seed 2024 are used for all experiments. SGD [2] is used as the optimizer where the initial learning rate is 0.001 and the weight decay is 0.0001. |
| Experiment Setup | Yes | The hyperparameters γ1, γ2 and π are set to 0.1, 0.1 and 0.7 by default. The shorter side of the image is resized to 600 during training and testing, and the reported mean average precision (m AP) is based on an Io U threshold of 0.5. For pre-training CLIP detector, we iterate 50K steps. For knowledge distillation, we generally iterate 45K steps using Eq.17, and then iterate 20K steps using Eq.18. |