Comprehensive Attention Self-Distillation for Weakly-Supervised Object Detection

Authors: Zeyi Huang, Yang Zou, B. V. K. Vijaya Kumar, Dong Huang

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental CASD produces new state-of-the-art WSOD results on standard benchmarks such as PASCAL VOC 2007/2012 and MS-COCO. Systematic ablation studies are also conducted on the effects of transformations and feature layers on CASD.
Researcher Affiliation Academia Zeyi Huang , Yang Zou , Vijayakumar Bhagavatula, and Dong Huang Carnegie Mellon University, Pittsburgh, PA, 15213, USA {zeyih@andrew, yzou2@andrew, kumar@ece, donghuang@}.cmu.edu
Pseudocode No The paper describes the proposed method using mathematical formulas and descriptive text but does not include a formally labeled "Pseudocode" or "Algorithm" block.
Open Source Code Yes Code are avaliable at https://github.com/De Light CMU/CASD
Open Datasets Yes Three standard WSOD benchmarks, PASCAL VOC 2007, VOC 2012 [41] and MS-COCO [42], are used in our experiments.
Dataset Splits Yes For VOC 2007, the total of 9, 962 images are split into three subsets: 2, 501 for training, 2, 510 for validation and 4, 951 testing. In VOC 2012, all the 22, 531 images are split into the 5, 717 training images, 5, 823 validation images, the rest 10, 991 test images. For both datasets, we followed the standard routine in WSOD [12, 8, 5, 7] to train on the train+val set and evaluate on the test set. For MS-COCO trainval set, the train set (82, 783 images) is used for training and the val set (40K images) is used for testing.
Hardware Specification No The paper states that "All experiments were implemented in Py Torch. The VGG16 and Res Net50 pre-trained on Image Net [43] are used as WSOD backbones.", but it does not specify any particular hardware (e.g., GPU models, CPU types) used for training or evaluation.
Software Dependencies No The paper mentions that "All experiments were implemented in Py Torch.", but it does not provide specific version numbers for PyTorch or any other software libraries or dependencies.
Experiment Setup Yes Batch size is set to be T that is the number of input transformations. The maximum iteration numbers are set to be 80K, 160K and 200K for VOC 2007, VOC 2012, and MS-COCO respectively. The whole WSOD network is optimized in an end-to-end way by stochastic gradient descent (SGD) with a momentum of 0.9, an initial learning rate of 0.001 and a weight decay of 0.0005. The learning rate will decay with a factor of 10 at the 40kth, 80kth, and 120kth iteration for VOC 2007, VOC 2012 and MS-COCO, respectively. The total number of refinement branches K is set to be 2. The confidence threshold for Non-Maximum Suppression (NMS) is 0.3. For all experiments, we set α = 0.1, β = 0.05 and γ = 0.1 in the total loss.