Cloud Object Detector Adaptation by Integrating Different Source Knowledge

Authors: Shuaifeng Li, Mao Ye, Lihua Zhou, Nianxin Li, Siying Xiao, Song Tang, Xiatian Zhu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiment results demonstrate that the proposed COIN method achieves the state-of-the-art performance.
Researcher Affiliation Academia 1 University of Electronic Science and Technology of China 2 University of Shanghai for Science and Technology 3 University of Surrey
Pseudocode Yes Algorithm 1 Our proposed COIN method.
Open Source Code Yes https://github.com/Flashkong/COIN
Open Datasets Yes Specifically, we validate the effectiveness of the proposed COIN method on six object detection datasets, e.g., Cityscapes [11], Foggy-Cityscapes [11], Clipart [25], BDD100K [63], KITTI [16] and Sim10K [26].
Dataset Splits No Cityscapes [11] consists of 2,975 training images and 500 testing images... Foggy-Cityscapes [11] contains three levels of foggy images simulated by the images of Cityscapes. 2,975 training images and 500 testing images... For comparison with existing methods, we follow [35, 14], and use 36,728 training images and 5,258 testing images with 7 classes for training and testing respectively. KITTI [16] contains 7,481 urban images with the car category. We use all the images for training and testing. Sim10K [26] contains 10K images collected from the computer game Grand Theft Auto V with the car category. All images are used for training and testing.
Hardware Specification Yes One 3090 GPU, a batch-size 3 and a random seed 2024 are used for all experiments.
Software Dependencies No One 3090 GPU, a batch-size 3 and a random seed 2024 are used for all experiments. SGD [2] is used as the optimizer where the initial learning rate is 0.001 and the weight decay is 0.0001.
Experiment Setup Yes The hyperparameters γ1, γ2 and π are set to 0.1, 0.1 and 0.7 by default. The shorter side of the image is resized to 600 during training and testing, and the reported mean average precision (m AP) is based on an Io U threshold of 0.5. For pre-training CLIP detector, we iterate 50K steps. For knowledge distillation, we generally iterate 45K steps using Eq.17, and then iterate 20K steps using Eq.18.