UWSOD: Toward Fully-Supervised-Level Capacity Weakly Supervised Object Detection

Authors: Yunhang Shen, Rongrong Ji, Zhiwei Chen, Yongjian Wu, Feiyue Huang

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on PASCAL VOC and MS COCO show that the proposed UWSOD achieves competitive results with the state-of-the-art WSOD methods while not requiring external modules or additional supervision.
Researcher Affiliation Collaboration Yunhang Shen1 Rongrong Ji1,2,3 Zhiwei Chen1 Yongjian Wu4 Feiyue Huang4 1Media Analytics and Computing Lab, Department of Artificial Intelligence School of Informatics, Xiamen University, 361005, China 2Institute of Artificial Intelligence, Xiamen University, 361005, China 3Peng Cheng Laborotory, Shenzhen, China 4Tencent Youtu Lab, Shanghai, China
Pseudocode No The paper describes the methods in text and via figures, but no structured pseudocode or algorithm blocks are present.
Open Source Code Yes The code is available at: https://github.com/shenyunhang/UWSOD.
Open Datasets Yes We evaluate the proposed design principles on PASCAL VOC 2007, 2012 [86] and MS COCO [87], which are widely-used benchmark datasets.
Dataset Splits Yes PASCAL VOC 2007 consists of 5, 011 trainval images, and 4, 092 test images over 20 categories. PASCAL VOC 2012 consists of 11, 540 trainval images, and 10, 991 test images over 20 categories. Our experiments use 118k training set with image-level labels for training, and 5k validation set for testing.
Hardware Specification No The paper mentions 'We use synchronized SGD training on 4 GPUs' but does not specify the model or type of GPUs, or any other specific hardware components like CPUs or memory.
Software Dependencies No The paper does not provide specific version numbers for software dependencies such as deep learning frameworks (e.g., PyTorch, TensorFlow) or CUDA.
Experiment Setup Yes We use synchronized SGD training on 4 GPUs. A mini-batch involves 1 images per GPU. We use a step learning rate decay schema with decay weight of 0.1 and step size of 140, 000 iterations. The total number of training iterations is 200, 000. We adopt 2 training schedules for MS COCO. In the multi-scale setting, we use scales range from 480 to 1216 with stride 32. To improve the robustness, we randomly adjust the exposure and saturation of the images by up to a factor of 1.5 in the HSV space. A random crop with 0.9 of the size of the original images is applied. We freeze all pre-trained convolutional layers in backbones unless specified otherwise. The test scores are the average of scales of {480, 576, 672, 768, 864, 960, 1056, 1152} and flips. Detection results are post-processed by NMS with threshold of 0.5. We set the labeling threshold λobn and λp to 0.5 and 0.7, respectively. For SWBBFT, We se the number of fine-tune branches nf to 4, and λf to {0.3, 0.4, 0.5, 0.6}. We apply MRRP on the last stage of backbone with nm = 3 and αm = {1, 2, 4}.