Multi-Knowledge Aggregation and Transfer for Semantic Segmentation

Authors: Yuang Liu, Wei Zhang, Jun Wang1837-1845

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To demonstrate the effectiveness of our proposed approach, we conduct extensive experiments on three segmentation datasets: Pascal VOC, Cityscapes, and Cam Vid, showing that MKAT outperforms the other KD methods.
Researcher Affiliation Academia East China Normal University, Shanghai, China
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks. It describes the methodology in text and mathematical formulations.
Open Source Code No The paper does not contain any statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets Yes Experiments Datasets Pascal VOC 2012. It (Everingham and Winn 2011) contains 20 foreground object classes and an extra background class. Following (Chen et al. 2017a; Zhao et al. 2017), we use the additional annotation provided by (Hariharan et al. 2011), resulting in 10,582 labeled images for training. Cityscapes. Cityscapes (Cordts et al. 2016) is for urban scene understanding and contains 30 classes... It contains 2,975 fine annotation images for training... Cam Vid. Cam Vid (Brostow et al. 2008) is an automotive dataset, containing 367 training and 233 testing images...
Dataset Splits Yes Cityscapes. Cityscapes (Cordts et al. 2016) is for urban scene understanding and contains 30 classes with only 19 classes used for evaluation. It contains 2,975 fine annotation images for training, 500 for validation, and 1,525 for testing.
Hardware Specification No The paper states 'Our approach is implemented by PyTorch.' but does not provide any specific details about the hardware used for the experiments (e.g., GPU models, CPU types).
Software Dependencies No The paper mentions 'Our approach is implemented by PyTorch.' but does not specify the version number for PyTorch or list any other software dependencies with their version numbers.
Experiment Setup Yes Training setup. Our approach is implemented by Py Torch. We employ Deelp Lab V3 with Res Net101 as a teacher network on Cityscapes, while Deelp Lab V3 with Res Net50 for VOC and Cam Vid. The student networks are default Deelp Lab V3 architecture with compact backbones (i.e., Res Net18 or Mobile Net). All the models are trained alone or with different KD methods by mini-batch stochastic gradient descent (SGD) with the momentum (0.9) and the weight decay (0.0005) for 120 epochs. Following (Chen et al. 2017b), we employ a poly learning rate policy where the initial learning rate is multiplied by (1 iter total iters)0.9 after each iteration. And, the initial learning rate of the backbone and encoders is set to 0.007 which is 0.1 times that of the auxiliary head or classifier. The batch size is set to 8, but 4 when testing on Cityscapes due to the super resolution. For data augmentation, we apply random horizontal flipping and random cropping (crop-size 513/768/540 for VOC/Cityscapes/Cam Vid) during training. The hyperparameters {τ, β} are set to {6.0, 0.5} for Cam Vid and {10.0, 0.5} for VOC/Cityscapes, respectively. And α is chosen from [10,15], default by 10. The channel of the latent knowledge is set to 256 in all the experiments.