Masked Distillation with Receptive Tokens

Authors: Tao Huang, Yuan Zhang, Shan You, Fei Wang, Chen Qian, Jian Cao, Chang Xu

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that our Mas KD can achieve state-of-the-art performance consistently on object detection and semantic segmentation benchmarks.
Researcher Affiliation Collaboration Tao Huang1,2 Yuan Zhang3 Shan You2 Fei Wang4 Chen Qian2 Jian Cao3 Chang Xu1 1School of Computer Science, Faculty of Engineering, The University of Sydney 2Sense Time Research 3School of Software and Microelectronics, Peking University 4University of Science and Technology of China
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/hunto/Mas KD.
Open Datasets Yes We conduct experiments on MS COCO dataset (Lin et al., 2014)... We conduct experiments on Cityscapes dataset(Cordts et al., 2016)
Dataset Splits Yes evaluate the networks with average precision (AP) on COCO val2017 set... evaluate the networks with mean Intersection-over-Union (m Io U) on Cityscapes val and test sets.
Hardware Specification No No specific hardware details (like GPU models, CPU types, or memory) are provided for the experimental setup.
Software Dependencies No The paper mentions "MMDetection (Chen et al., 2019)" but does not specify a version number for this or any other key software component.
Experiment Setup Yes We train our mask tokens for 2000 iterations using an Adam optimizer with 0.001 weight decay, and a cosine learning rate decay is adopted with an initial value of 0.01. ... For the loss weights, we simply set λ1 and λ2 to 1 in Eq.(10) on Faster RCNN-R50 student... We train the models using an SGD optimizer with a momentum of 0.9, and a polynomial annealing learning rate scheduler is adopted with an initial value of 0.02. We train the mask tokens for 2000 iterations in the mask learning stage, and then train the student for 40000 iterations.