TIKP: Text-to-Image Knowledge Preservation for Continual Semantic Segmentation
Authors: Zhidong Yu, Wei Yang, Xike Xie, Zhenbo Shi
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments conducted in the same setting show that TIKP outperforms state-of-the-art methods by a large margin on benchmark datasets. |
| Researcher Affiliation | Academia | Zhidong Yu1, Wei Yang1,2*, Xike Xie1,3*, Zhenbo Shi1,3 1School of Computer Science and Technology, University of Science and Technology of China, Hefei 230026, China 2Hefei National Laboratory, Hefei 230088, China 3Suzhou Institute for Advanced Research, University of Science and Technology of China, Suzhou 215123, China qubit@ustc.edu.cn |
| Pseudocode | No | The paper describes methods and formulas but does not provide any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not include a statement about releasing the code or a link to a code repository. |
| Open Datasets | Yes | Datasets. We validate our method on three benchmark datasets: Pascal VOC 2012 (Everingham et al. 2010), Cityscapes (Cordts et al. 2016) and ADE20k (Zhou et al. 2017). |
| Dataset Splits | Yes | The Pascal VOC 2012 dataset contains 20 object classes and one background. Its training and validation sets include 10,582 and 1,449 images, respectively. ... The Cityscapes dataset contains 2,975 training images, 500 validation images and 1,525 test images. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory used for the experiments. |
| Software Dependencies | No | The paper mentions software like 'Deeplabv3', 'ResNet-101', and 'SGD optimizer', but does not specify version numbers for any software dependencies. |
| Experiment Setup | Yes | For all experiments, as in previous work, we use Deeplabv3 (Chen et al. 2017) as the segmentation network with ResNet-101 (He et al. 2016) as the backbone, which is pre-trained on Image Net (Deng et al. 2009). The feature distillation is used as PLOP (Douillard et al. 2021). For the Pascal VOC 2012 and ADE20k datasets, the model is trained with a crop size of 512 512 and a batch size of 12. The model is trained for 30 epochs on Pascal VOC 2012 and 60 epochs on ADE20k, respectively. For Cityscapes, the model is trained for 50 epochs with a crop size of 800 800. Empirically, λ1 is set to 1 and λ2 is set to 10 in experiments. We use the stochastic gradient descent (SGD) optimizer, where the base learning rate is 0.001 with a weight decay of 0.0001. We use the Text-to-Image model to generate 100 images for each class for the Pascal VOC 2012, 50 images for each class for ADE20k, and 50 images for each class for Cityscapes via text prompts. |