Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

OW-VAP: Visual Attribute Parsing for Open World Object Detection

Authors: Xing Xi, Xing Fu, Weiqiang Wang, Ronghua Luo

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comparative results on the OWOD benchmark demonstrate that our approach surpasses existing state-of-the-art methods with a +13 improvement in U-Recall and a +8 increase in U-AP for unknown detection capabilities. Furthermore, OW-VAP approaches the unknown recall upper limit of the detector. We evaluate OW-VAP on standard OWOD benchmarks, MOWODB and SOWODB, which are composed of a mixture of VOC (Everingham et al., 2010) and COCO (Lin et al., 2014) datasets. In Figure 1, we present the curves of U-Recall and U-AP on MOWODB benchmark, which are the primary metrics of interest for OWOD. OW-VAP outperforms the previous state-of-the-art (SOTA) methods, by a margin of 13+ U-Recall.
Researcher Affiliation Collaboration 1School of Computer Science & Engineering, South China University of Technology, Guangzhou, China 2Ant Group, Hangzhou, China. Correspondence to: Ronghua Luo <EMAIL>, Xing Fu <EMAIL>.
Pseudocode Yes The problem definition of OWOD and pseudocode for the proposed components (VAP, PSLA) are shown in Appendix B and Appendix H, respectively. H. Pseudo Code The Visual Attribute Parser (VAP, Algorithm 1). ... The Probabilistic Soft Label Assignment (PSLA) algorithm addresses optimization conflicts caused by background interference in pseudo-label generation (Algorithm 2).
Open Source Code No Code. In the near future, once the code passes the company s review, we will release all the trained code, weights, and models (including other SOTA) along with visualization files at the MOWODB dataset level.
Open Datasets Yes We evaluate OW-VAP on standard OWOD benchmarks, MOWODB and SOWODB, which are composed of a mixture of VOC (Everingham et al., 2010) and COCO (Lin et al., 2014) datasets.
Dataset Splits Yes Table 5. Detailed Description of Dataset Partitioning. Train and test images denote the number of images in the training and test sets, respectively, while the corresponding instances represent the number of test instances. MOWODB utilizes a combination of VOC and COCO datasets, whereas SOWODB exclusively uses the COCO dataset. (a). Semantic category division of MOWODB. (b). Semantic category division of SOWODB.
Hardware Specification Yes All experiments are conducted using 8 V100 GPUs (total 128 GB).
Software Dependencies No We implement all experiments using MMDetection (Chen et al., 2019).
Experiment Setup Yes For the text encoder, we use the CLIP model, specifically the version clip-vit-base-patch32. During training, we freeze the text and visual encoders, training only the VAP and known class embeddings. All experiments are conducted using 8 V100 GPUs (total 128 GB). We implement all experiments using MMDetection (Chen et al., 2019). For the experimental parameters, we follow the official settings of YOLO-World, altering only the training epochs: 10 epochs for MOWODB and 2 for SOWODB. F. Ablation Experiment of Hyperparameters δ. In Equation (5), we introduce the hyperparameter δ. This parameter controls the number of samples selected from the background region. ... n. In Equation (6), we introduce the parameter n. This parameter is utilized to control the number of parameters in the queue. ... b. In Equation (10), we integrate the raw attributes as a rough estimate of the current visual region. ... The performance of the detector as a function of the parameter b is shown in Table 6(c).