KTCN: Enhancing Open-World Object Detection with Knowledge Transfer and Class-Awareness Neutralization

Authors: Xing Xi, Yangyang Huang, Jinhao Lin, Ronghua Luo

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Evaluation results on open-world object detection benchmarks, including MS COCO and Pascal VOC, show that our method achieves nearly 200% the unknown recall rate of previous state-of-the-art (SOTA) methods, reaching 41.5 U-Recall. Additionally, our approach does not add any extra parameters, maintaining the inference speed advantage of Faster R-CNN, leading the SOTA methods based on deformable DETR at a speed of over 10 FPS. Our code is available at https://github.com/xxyzll/KTCN.
Researcher Affiliation Academia Xing Xi , Yangyang Huang , Jinhao Lin and Ronghua Luo* South China University of Technology xxyzll@yeah.net, huangyangy@whu.edu.cn, csljh jasper@mail.scut.edu.cn, rhluo@scut.edu.cn,
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code Yes Our code is available at https://github.com/xxyzll/KTCN.
Open Datasets Yes Consistent with previous works[Ma et al., 2023; Zohar et al., 2023], we use the VOC[Everingham et al., 2010] and COCO[Lin et al., 2014] to construct a hybrid dataset.
Dataset Splits No The paper specifies training and test dataset sizes but does not explicitly mention a separate validation split or its size.
Hardware Specification Yes Our experiments are based on Detectron2[Wu et al., 2019] with 2 NVIDIA Ge Force RTX 4090 (total 48G).
Software Dependencies No The paper mentions "Detectron2" but does not specify version numbers for it or any other software dependencies.
Experiment Setup Yes We set the batch size to 8 and the initial learning rate to 0.005. The maximum number of iterations was set at 120k. When the number of iterations reached 60k and 100k, we reduced the model s current learning rate to one-tenth of its original value. In multi-scale training, the shortest side of the image was scaled between 220 and 1088, with the longest side not exceeding 1333. For incremental learning, we set the learning rate to 0.001 (T2 T3) and 0.0005 (T4), the number of warm-up iterations to 500, and the maximum number of iterations to 80k, without reducing the learning rate.