UMB: Understanding Model Behavior for Open-World Object Detection
Authors: Xing Xi, Yangyang Huang, Zhijie Zhong, Ronghua Luo
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The evaluation results on the Real-World Object Detection (RWD) benchmark, which consists of five real-world application datasets, show that we surpassed the previous state-of-the-art (SOTA) with an absolute gain of 5.3 m AP for unknown classes, reaching 20.5 m AP. |
| Researcher Affiliation | Academia | Xing Xi Yangyang Huang Zhijie Zhong Ronghua Luo School of Computer Science and Engineering South China University of Technology Guang Zhou, China 510006 Corresponding author: rhluo@scut.edu.cn. |
| Pseudocode | Yes | Algorithm 1: Textual Attribute Generation and Known Class Prediction |
| Open Source Code | Yes | Our code is available at https://github.com/xxyzll/UMB. |
| Open Datasets | Yes | The OWOD benchmark is established on the VOC[31] and COCO[30] datasets. ... The RWD benchmark consists of five typical application scenarios for object detection, including underwater scenes, representing visual blurring caused by the environment (Aquatic[32]); aerial scenes, where the targets are small and difficult to distinguish (Aerial[33]); scenarios using synthetic data when data is lacking (Game[34]); medical X-ray scenes, where it is difficult to distinguish between categories and professional knowledge is required(Medical[35]); and human surgery scenes, where the field of view is blurred by blood (Surgery[36]). |
| Dataset Splits | No | We divide RWD into two subtasks according to a 50% category ratio. When training in Task 1, all categories in the test set that belong to Task 2 are treated as unknown classes, and when training in Task 2, the categories of Task 1 are considered as previously seen classes. |
| Hardware Specification | Yes | All experiments were conducted using a single NVIDIA Ge Force RTX 4090 GPU. |
| Software Dependencies | No | The large language model used for attribute generation is GPT-3.5. All optimizers used Adam W. |
| Experiment Setup | Yes | During the attribute selection phase, BCE was the loss function, and the learning rate remained constant without decreasing with iterations. ... This phase used MSE as the loss function, with a maximum of 1000 iterations. ... the learning rate and maximum number of iterations for training were set to three values ([1e-5, 5e-5, 1e-4], [1, 10, 100]). ... In the distribution optimization phase, we set the window value to 10. ... During training, Adam was used as the optimizer, the learning rate was set to 0.01, the maximum number of iterations was 10000, and the maximum number of probability models was set to 5. |