Decomposed Knowledge Distillation for Class-Incremental Semantic Segmentation

Authors: Donghyeon Baek, Youngmin Oh, Sanghoon Lee, Junghyup Lee, Bumsub Ham

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on standard CISS benchmarks demonstrate the effectiveness of our framework. We demonstrate the effectiveness of our framework with extensive ablation studies on standard CISS benchmarks [11, 30].
Researcher Affiliation Academia 1Yonsei University 2Korea Institute of Science and Technology (KIST)
Pseudocode No No pseudocode or algorithm blocks were found in the paper.
Open Source Code Yes But our code and models is available online. https://cvlab.yonsei.ac.kr/projects/DKD/
Open Datasets Yes We use PASCAL VOC [11] and ADE20K [30] datasets for evaluation. PASCAL VOC [11] consists of 10, 582 training and 1, 449 validation images for 20 object and background classes. ADE20K [30] provides 20, 210 and 2, 000 images for training and validation, respectively, with 150 object and stuff classes.
Dataset Splits Yes PASCAL VOC [11] consists of 10, 582 training and 1, 449 validation images for 20 object and background classes. ADE20K [30] provides 20, 210 and 2, 000 images for training and validation, respectively, with 150 object and stuff classes. Following the protocol in [5], we use official validation splits for evaluation. We also exclude 20% of training sets, and use them to tune hyper-parameters.
Hardware Specification Yes We implement our model using Py Torch [24] and train it with four NVIDIA RTX A5000 GPUs.
Software Dependencies No We implement our model using Py Torch [24].
Experiment Setup Yes For PASCAL VOC [11], we train our model with 60 epochs for both initial and incremental steps, with a batch size of 32. We adopt a polynomial learning rate scheduler, where learning rates are set to 0.001 and 0.0001 for initial and incremental steps, respectively. We empirically set γ to 2 during an initial step and 1 for others. For ADE20K [30], we train our model for 100 epochs with a batch size of 24. Following [6], we adopt the poly learning rate scheduler with a linear warm-up [12], where learning rates are set to 0.0025 for an initial step and 0.00025 for incremental ones, respectively. We set γ to 35 for all training steps. For both datasets, we adopt the SGD optimizer with momentum of 0.9, and set α and β to 5.