Decomposed Knowledge Distillation for Class-Incremental Semantic Segmentation
Authors: Donghyeon Baek, Youngmin Oh, Sanghoon Lee, Junghyup Lee, Bumsub Ham
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on standard CISS benchmarks demonstrate the effectiveness of our framework. We demonstrate the effectiveness of our framework with extensive ablation studies on standard CISS benchmarks [11, 30]. |
| Researcher Affiliation | Academia | 1Yonsei University 2Korea Institute of Science and Technology (KIST) |
| Pseudocode | No | No pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | Yes | But our code and models is available online. https://cvlab.yonsei.ac.kr/projects/DKD/ |
| Open Datasets | Yes | We use PASCAL VOC [11] and ADE20K [30] datasets for evaluation. PASCAL VOC [11] consists of 10, 582 training and 1, 449 validation images for 20 object and background classes. ADE20K [30] provides 20, 210 and 2, 000 images for training and validation, respectively, with 150 object and stuff classes. |
| Dataset Splits | Yes | PASCAL VOC [11] consists of 10, 582 training and 1, 449 validation images for 20 object and background classes. ADE20K [30] provides 20, 210 and 2, 000 images for training and validation, respectively, with 150 object and stuff classes. Following the protocol in [5], we use official validation splits for evaluation. We also exclude 20% of training sets, and use them to tune hyper-parameters. |
| Hardware Specification | Yes | We implement our model using Py Torch [24] and train it with four NVIDIA RTX A5000 GPUs. |
| Software Dependencies | No | We implement our model using Py Torch [24]. |
| Experiment Setup | Yes | For PASCAL VOC [11], we train our model with 60 epochs for both initial and incremental steps, with a batch size of 32. We adopt a polynomial learning rate scheduler, where learning rates are set to 0.001 and 0.0001 for initial and incremental steps, respectively. We empirically set γ to 2 during an initial step and 1 for others. For ADE20K [30], we train our model for 100 epochs with a batch size of 24. Following [6], we adopt the poly learning rate scheduler with a linear warm-up [12], where learning rates are set to 0.0025 for an initial step and 0.00025 for incremental ones, respectively. We set γ to 35 for all training steps. For both datasets, we adopt the SGD optimizer with momentum of 0.9, and set α and β to 5. |