Gold-YOLO: Efficient Object Detector via Gather-and-Distribute Mechanism
Authors: Chengcheng Wang, Wei He, Ying Nie, Jianyuan Guo, Chuanjian Liu, Yunhe Wang, Kai Han
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform extensive experiments on the Microsoft COCO datasets to validate the proposed detector. For the ablation study, we train on COCO train2017 and validate on COCO val2017 datasets. We use the standard COCO AP metric with a single scale image as input, and report the standard mean average precision (AP) result under different Io U thresholds and object scales. |
| Researcher Affiliation | Industry | Huawei Noah s Ark Lab {wangchengcheng11,hewei142,ying.nie,jianyuan.guo,liuchuanjian, kai.han,yunhe.wang}@huawei.com |
| Pseudocode | No | The paper includes mathematical formulas and block diagrams (e.g., Figure 4) to describe the proposed mechanism and its components, but it does not contain a dedicated pseudocode block or algorithm listing. |
| Open Source Code | Yes | The Py Torch code is available at https://github.com/huawei-noah/Efficient Computing/tree/master/Detection/Gold-YOLO, and the Mind Spore code is available at https://gitee.com/mindspore/models/tree/master/research/cv/Gold_YOLO. |
| Open Datasets | Yes | We perform extensive experiments on the Microsoft COCO datasets to validate the proposed detector. For the ablation study, we train on COCO train2017 and validate on COCO val2017 datasets. We conducted MIM unsupervised pretraining on the backbone using the 1.28 million Image Net-1K datasets [8]. |
| Dataset Splits | Yes | For the ablation study, we train on COCO train2017 and validate on COCO val2017 datasets. We use the standard COCO AP metric with a single scale image as input, and report the standard mean average precision (AP) result under different Io U thresholds and object scales. |
| Hardware Specification | Yes | All our models are trained on 8 NVIDIA A100 GPUs, and the speed performance is measured on an NVIDIA Tesla T4 GPU with Tensor RT. FPS and latency are measured in FP16-precision on a Tesla T4 in the same environment with Tensor RT 7. FPS and latency are measured in FP16-precision on Tesla T4 in the same environment with Tensor RT 8.2. FPS and latency are measured in FP16-precision on Tesla V100 in the same environment with Tensor RT 7.2. |
| Software Dependencies | Yes | FPS and latency are measured in FP16-precision on a Tesla T4 in the same environment with Tensor RT 7. All our models are trained for 300 epochs. Both the accuracy and the speed performance of our models are evaluated with the input resolution of 640x640. FPS and latency are measured in FP16-precision on Tesla T4 in the same environment with Tensor RT 8.2. FPS and latency are measured in FP16-precision on Tesla V100 in the same environment with Tensor RT 7.2. |
| Experiment Setup | Yes | We followed the setup of YOLOv6-3.0 [32] use the same structure (except for neck) and training configurations. The backbone of the network was implemented with the Efficient Rep Backbone, while the head utilized the Efficient Decoupled Head. The optimizer learning schedule and other setting also same as YOLOv6, i.e. stochastic gradient descent (SGD) with momentum and cosine decay on learning rate. Warm-up, grouped weight decay strategy and the exponential moving average (EMA) are utilized. Self-distillation and anchor-aided training (AAT) also be used in training. The strong data augmentations we adopt Mosaic [2, 13] and Mixup [58]. Following the experiment settings in Spark [46], we employed a LAMB optimizer [55] and cosine-annealing learning rate strategy, with a masking ratio of 60 % and a mask patch size of 32. For the Gold-YOLO-L models, we employed a batch size of 1024, while for the Gold-YOLO-M models, a batch size of 1152 was used. |